Closed evanmiltenburg closed 7 years ago
Thanks for this bug report. We've been able to track it down to inconsistencies in the Rails database and we are working on diagnosing where it originates from and how to introduce constraints to keep it from cropping up. It actually doesn't have to do with the search functionality directly.
This is a side-effect of bug #42 - the specific error in this issue has been fixed, but discussion on the background issue continues.
Now that the main cause has been identified (and given its own issue), I'm closing this issue.
Thanks for taking care of this!
Ok, I suppose it's fixed in the sense that it doesn't give an error. But the results are not very relevant:
A query for Flickr30K results in papers that don't contain either the word Flickr, Flickr30K, 30, or 30K. The top two are relevant, but the others not so much. E.g. why is "Demonstration of ILEX 3.0" there?
(Note that if I click through, the link for the Multi30K paper is dead.)
Meanwhile, a search for 'flickr' results in only one hit. So there's something funny going on with how strings containing numbers are treated.
As far as the link for the Multi30K paper returning 404, we are looking into that right now in issue #43 . Thanks for pointing it out.
As for the search results, here's the technical explanation: the current search deals only with metadata, not the actual content of papers. There is exactly one paper that mentions "Flickr" ("Multilingual interactive experiments with Flickr") and one paper that includes the term "30K" ("Multi30K: Multilingual English-German Image Descriptions"), both of which are returned in your search. The next search result ("Demonstration of ILEX 3.0") includes the term "3.0", which is similar to "30".
The deeper point is: what exactly should the search functionality do? Extending the search to look into the papers themselves sounds reasonable, but I cannot give any details until I take a deeper look at the codebase. Reading the content of PDF files could also be a lot of pain.
I'll open a new issue for this, and let's hope I can have a positive update to that soon.
Compare:
https://aclanthology.coli.uni-saarland.de/catalog?per_page=50&q=framenet&search_field=all_fields&utf8=%E2%9C%93
https://aclanthology.coli.uni-saarland.de/catalog?per_page=50&q=flickr30k&search_field=all_fields&utf8=%E2%9C%93
Queries for words without numbers in them work well (e.g. "framenet"), and I can select the number of items per page. But I cannot choose the number of items per page for queries like "flickr30k" without the website breaking down: