adsabs / adsabs-dev-api

Developer API service description and example client code
162 stars 58 forks source link

Match regular expression on "body" field #73

Open vmplacco opened 3 years ago

vmplacco commented 3 years ago

Hi there,

I am working with the US National Gemini Office within NOIRLab, trying to automate the search for new publications with the Gemini Observatory data, but subdividing it by partner country based on the program ID. Currently, I am able to use the "bibgroup:gemini" successfully on the ADS search API, but I also wanted to search the entire text (including acknowledgments) and find the specific Gemini Program ID. That is needed to cross-match with the observing database and assign a given paper to the partner country from which the data came (for statistical purposes). I tried filtering by the countries listed on the affiliations. Even though that works, it does not give an accurate count for various reasons.

I wanted to be able to search for the following regular expression

G[NS].*20[0-9][0-9][AB].*[0-9][0-9][0-9]

This regex accounts for all valid program IDs (e.g. GS-2020A-Q-123) and also for the most common mistakes (GS2020A-Q123 and so on).