dgerosa / filltex

Automatic queries to ADS and INSPIRE databases to fill LaTex bibliography
https://davidegerosa.com/filltex
MIT License
21 stars 6 forks source link

Better detection of ADS and arXiv identifiers #2

Closed lnielsen closed 7 years ago

lnielsen commented 7 years ago

The fillbib.py script can improve the detection of ADS and arXiv identifiers.

You can use the https://idutils.readthedocs.io package for that or simply copy over the corresponding regular expressions if you don't want the dependency.

lnielsen commented 7 years ago

Related to openjournals/joss-reviews#222

dgerosa commented 7 years ago

Thanks for this comment.

lnielsen commented 7 years ago

If you include \cite{10.1051/0004-6361/201322068} in example.tex you'll end up with @ARTICLE{2013A&A...558A..33A,… in the bib file. I.e. when you use a personal citation key and this key yield valid results in either INSPIRE/ADS you will include the entry in the bibliography which I don't think is desirable.

ADS bibcodes are well-described and very simple to validate with a simple regular expression, and adding it will make the program more robust.

The INSPIRE citation keys are unfortunately not something I see that they have a standard for (perhaps @kaplun can correct me if I'm wrong), but you can e.g. make a search query that only searches the citation keys like this:

035__a:Rowson:1985xh AND (035__9:SPIRESTeX OR 035__9:INSPIRETeX)

The 035__a/9 is the MARC21 (a metadata standard for libraries) fields where the citation keys are stored. The ADS query can perhaps be fixed in a similar manner, but even though the query look likes it only query the bibcode it still give results if you give it e.g. DOI.

In the end these changes are just to make the program more robust when you have personal keys which I think is very likely in real world usage of filltex.

As for arXiv, I don't think it's the programs job to enforce a specific pattern on what an author should cite - i.e. an author can have fully valid reasons why they want to cite an arxiv paper. This doesn't mean you need to add support for arxiv, it just mean I wouldn't let it be your personal opinion on what people should cite be the deciding factor in this case.

dgerosa commented 7 years ago

Thanks for spotting this issue with the DOI! I agree is not desirable: I didn't realize a query with the DOI on ADS would result in an item being downloaded.

The possible issue I see here is that DOIs are part of both ADS and INSPIRE; I would not like filltex to decide for the user which entry should be used. I'd rather leave DOIs entirely as personal keys, which are not filled with any database information.

I've placed simple if statements to check that the entry just downloaded is actually one that was originally requested. Including a DOI key, like \cite{10.1051/0004-6361/201322068}, now does not write anything to the .bib file, as you were looking for10.1051/0004-6361/201322068 but retrieved 2013A&A...558A..33A.

This additional check does not apply to references cited with ADS preprint keys because in that case the input key and the retrieved key are different. Handling of these cases constitutes a known limitation of the code as mentioned in the readme, and should therefore be used with care.

Does this look acceptable to you? Thanks for all these comments, they are greatly improving this code.