Closed wmeijer221 closed 2 years ago
I'm running version 1.4.4 on Ubuntu 22.04 LTS btw; if that matters.
I'm at work at the moment, so I can't fully debug the situation, but I can give some insight:
The exact place where your query is utilized for searching can be found here.
The query parser (which may be the pivotal component here) is using the parse(String query, ...)
method as defined here.
Now, it may be the way in which the query parser is constructed, but I am not sure since I for the most part simply copied the indexing and search strategy from a previous bachelor project that this one builds on. If you can send me the .zip
of your dataset, and some example queries and a description of what the expected search results are, then I can investigate further and try and fix the issue this evening.
I uploaded the data set + queries here: link. You'll need your RuG account to access it. Mohamed shared this data set with me; It's the same one you shared with him (iteration 3, I believe; not sure though, since I renamed it). All of the queries I added shouldn't be returning any mails with "VOTE" in them. Yet, they do.
@wmeijer221 I think I have found and fixed the issue. You can try it out with version 1.4.5 of the browser app.
Just a side-note: I wasn't able to access the drive link you sent, even when attempting to access it from my a.lalis@student.rug.nl
account. But anyways, when I used my iteration-3 dataset and the query issue -subject:"VOTE"
, I don't see any results whose subject contains the VOTE
string, and using +subject:"VOTE"
gives the inverse results, as expected.
For documentation's sake, this change seems to have fixed it; apparently there's some nuance in the different Field
classes.
Please let me know if there are still issues, and if not, you can go ahead and close this issue.
Actually I just noticed and merged your PR for improved HTML detection, so make that version 1.4.6.
Oeh, my bad for Drive. This one should work: link.
I don't think it's completely resolved yet. In the image you can see I've used a query that should exclude VOTE
, however, the mail I selected does still have VOTE
in it. Similarly to the queries I added to the drive example, I tried with wildcards etc., but that doesn't seem to change anything.
Ah, I forgot to mention, you need to rebuild your dataset indexes using the new browser version. Open your dataset in the browser app, then go to File > Regenerate Indexes. This is necessary because the underlying issue was caused by an error I made when indexing the subject
field (I think that in old versions I did not include this in the index at all).
Solved! Thanks Andrew!
Hey @andrewlalis,
I started exploring Lucene queries, however, I'm struggling with the exclusion operator. In the image you can see that I'm trying to exclude mails with "VOTE" in their subject. However, they're not excluded at all. When swapping it out with the "NOT" operator, it doesn't work either.
Is there something I'm missing, or something implementation-specific that I'm not taking into account?
Thanks in advance!