NBCLab / athena

Tool for mining and synthesis of cognitive neuroimaging tasks.
https://doi.org/10.3389/fnins.2019.00494
2 stars 1 forks source link

Look into reference extractors. #25

Closed tsalo closed 8 years ago

tsalo commented 8 years ago

References seem like a good source of information. Jason wrote regular expressions to extract references last semester, but I think we need a more developed tool. I will look into available extraction tools.

mdtdev commented 8 years ago

Have you seen this for a parser: http://freecite.library.brown.edu/ -- it picks out the informational content from a citation. Here it has color coded the parts of the citation:

image

I did not copy the color codes but for this citation they are correct.

tsalo commented 8 years ago

The most appropriate fork seems to be this one from Academia.edu. This fork drops the web-app elements and provides a Ruby API instead. I don't know any Ruby, but I'm sure we could hack together something in Python to call the package. Free-cite extracts a number of useful features from the references, which we could use to create unique identifiers as features.

I also looked into a few other packages, including pdfx (which was pretty ineffective) and refextract (which didn't actually seem applicable).

mdtdev commented 8 years ago

The fork you reference refers to this one. Not sure if there is a difference that matters.

tsalo commented 8 years ago

The other fork seems like it's more updated, but is also focused on the web-app, rather than an API. That's just based on the README, though.

mdtdev commented 8 years ago

They are both in Ruby, though, so the internal code may matter if there are any improvements.

Oh wait, when you say "web app" do you mean that it runs as a server? If the long run plan is to make some web distributables on this project, maybe building to local servers--even for internal services--might be a good target along the way.

Or are you imagining calling the Ruby from within a Python program with the Ruby code wrapped (a la C) as a Python package? It might be easier to run the Ruby as an independent process and just access it as a service via the RESTful API. Then all you need is Python's URLlib (whichever version is most current). Well, you have to start the translator process, but that is not a big deal.

tsalo commented 8 years ago

It does seem to run as a server. I wonder if there's a limit for API calls. It doesn't mention it in the README so I'm going to assume not.

In any case, my plan was to call the Ruby code from within Python using system calls or something. It's not that I think that's a good idea, it's just that that's all I know how to do. Using it as a service sounds better though, assuming they don't charge.

mdtdev commented 8 years ago

No, sorry, I was not clear. I think you can download the code to your own system, then run it locally (at localhost:3000) and attach to it that way--no other computer involved, a "local server." The cURL example is running that way.

tsalo commented 8 years ago

Okay. In that case I guess there's no reason not to run it as a server.

tsalo commented 8 years ago

@mriedel56 and I will attempt to incorporate this into the paper version in #27.

tsalo commented 7 years ago

CrossRef could possibly use PubMed metadata to identify references (both cited papers and papers citing the article).