Closed OmnesRes closed 8 years ago
The author issue was a false alarm. For some reason certain authors were not searchable, but upon inspection they were in the database. Reloading the server solved the problem. I don't think it's worth rebuilding the database for the arXiv q-bio issue at the moment, but I should at least try and stop indexing the new line characters.
I did end up finding a small issue with author names. If someone enters first name last name, and the first name is in the database as both a first name and a last name, and the last name is in the database as a first name and as a last name, the current code doesn't currently identify the name. I think it's an easy fix.
I discovered today that bioRxiv authors with an associated ORCID ID are not getting scraped correctly. I've also been aware for some time that arXiv q-bio titles and abstracts have hidden new line characters which will affect searches with double quoted phrases. I'm also aware of affiliation searches with advanced search potentially returning duplicated articles.
I need to change some of the indexing code and rebuild the database. I think a distinct() call on the advanced_search query set may fix the affiliation issue.