deragent / arXivFilter

Small program which allows to custom filter arXiv daily blasts per Drag&Drop
MIT License
1 stars 2 forks source link

Request: Support negative weights #10

Open mod20388 opened 11 months ago

mod20388 commented 11 months ago

Right now only positive values are supported by the filter. This is a request to support also negative values. The use case for this is filtering out keywords, authors, etc. that the user is not interested in.

deragent commented 11 months ago

Hi, I think this use case is very interesting, even though I did not use it so far.

But, the current implementation does not really distinguish the weights based on positive or negative values, and setting negative weights should already work.

I tried this quickly, and indeed, setting a negative value will lead to a lower score. Also, if the total score goes below 0, the corresponding entry will not be shown in the top list, but in the bottom "Other Entries" list, as one might expect.

Can you please test this, and tell me how else you would like this feature to behave?

mod20388 commented 11 months ago

Yes, the negative scores seem to be taken into account but it is a bit funky. Consider the following two cases:

This makes it hard to filter out uninteresting articles.

mod20388 commented 11 months ago

I think the simplest solution is removing the positive score checks in the score computation in filter.py. This then allows negative scores to add up. Unless there is a rationale for insisting in positive scores for each key?

deragent commented 11 months ago

You are completely right.

I hove overlooked, that the threshold score > 0 is applied on a per category basis and not overall.

I agree, negative scores should add up, I will implement this. I will also work out, how to highlight the matched terms properly in this case.

mod20388 commented 11 months ago

I have fiddled with this locally and it seems to be working fine. I'm going to create a PR with what I have.

I wonder if there are any UI changes that should go along with this? As you already mentioned highlighted text makes sense. Perhaps the articles in the "Other Entries" should also be sorted by (negative) score?