CrossRef / pdfextract

MOVED TO https://gitlab.com/crossref/pdfextract
https://gitlab.com/crossref/pdfextract
MIT License
508 stars 89 forks source link

Fix reference resolver, and add option to output BibTeX with the CrossRef API #11

Closed jdherman closed 10 years ago

jdherman commented 10 years ago

Thanks for making this library! It's a great idea.

Two suggestions in this pull request:

  1. The resolver for the --resolved_references option was broken. I switched it to the updated http://search.labs.crossref.org/dois?q=#{...}. If found, it sticks the DOI and score into the XML output.
  2. Add a new extract-bib command, which first finds the resolved reference DOIs, and then uses the REST API to fetch the bibtex citation at http://api.crossref.org/works/#{doi}/transform/application/x-bibtex. This does not affect the existing XML output with the extract command. It outputs the bibtex citations to {file_base}.bib. It's a little slow with all the fetching, so I added verbose output to make sure everything's working ok. Only the refs with score > 1 are printed to the bibtex file.

I'd love to see both of these in the gem someday. I think (2) especially could be extended to any format supported by the new CrossRef API. Thanks again!

kjw commented 10 years ago

This is fantastic to see.

Can I ask for one small change before I merge? Can you change the URL used to speak to the CrossRef REST API to this:

"http://api.crossref.org/v1/works/#{obj[:doi]}/transform/application/x-bibtex"

Including the v1 path prefix will stop this from breaking when I release new versions of the REST API.

I'd eventually like to move over my old search.crossref.org/links APIs to the new REST API. On my to do list, but I guess until then this is fine.

jdherman commented 10 years ago

Thanks Karl. I added the v1 prefix.

About the search API -- do you mean replacing http://search.labs.crossref.org/dois?q=#{...} with http://api.crossref.org/works?query=#{...} ? This is an easy change as long as the back end is the same.

kjw commented 10 years ago

There is another route at search.crossref.org - search.crossref.org/links that allows batch citation -> DOI look up. I'd like to replicate that in api.crossref.org in some form (just as /dois is already replicated as you mention.)

jdherman commented 10 years ago

Ok, thanks for merging. It sounds like it should use /links instead of /dois for the lookup ... I'll look into fixing this if I have time. *Edit: in resolve.rb, the text to DOI search is done one at a time. So it won't be able to use /links without more restructuring.