ReVanced / revanced-website

🌐 Website for ReVanced
https://revanced.app
GNU General Public License v3.0
279 stars 43 forks source link

fix: handle hyphens properly in patches #225

Closed KTibow closed 5 months ago

KTibow commented 5 months ago
revanced-bot commented 5 months ago

Deployed at https://1d43ad98.revanced.pages.dev.

oSumAtrIX commented 5 months ago

Is the sanitization necessary? A better fit for searching would be fuzzy search but out of scope for this PR

revanced-bot commented 5 months ago

Deployed at https://e5869718.revanced.pages.dev.

KTibow commented 5 months ago

the sanitization was already in place. i just moved it around a bit.

oSumAtrIX commented 5 months ago

The function should be removed. It was used when patches were named-like-this

KTibow commented 5 months ago

i don't think it should. both "wifi" and "wi-fi" should match "wi-fi", and the only way to do that is by sanitizing both.

oSumAtrIX commented 5 months ago

This is an arbitrary sanitization attempt with no defined pattern. What makes - eligible but not any other character? What about . or _? What if there's a patch with a - and one without? By specifications of a patch name, a patch could be named "-" only (hypothetically). If you want to match strings partially, then you should use a proper algorithmic approach such as fuzzy search and not just sanitize - arbitrarily. This PR is supposed to fix the referenced issue and that is done by removing the sanitization code as it is incorrectly included right now and should've been removed a while ago when the names of patches had changed. You could try to cram a feature in this PR, what is supposed to be a fix only, which is why I said it's better to attempt at lossy matching in another PR

KTibow commented 5 months ago

I see your point that - is arbitrary, but I don't follow the argument that we must switch to fuzzy search. How do you feel about switching to str.replace(/\W/g, '') for both the query and patch details?

oSumAtrIX commented 5 months ago

Fuzzy search is particularly useful if you want to match strings with any kind of mistake stochastically. This fixes your attempt at sanitizing the string so that wifi matches Wi-Fi for example but expands the character set to any kind of character. Any other sanitization attempt would be inferior to that including your replacement script as you only sanitize a subset now. Like I said, the suggestion to use fuzzy search is a follow up on you wanting to sanitize in the first place, but attempted to recylce an obsolete function that had a different purpose, whereas this PR is not even about sanitization or fuzzy search but instead supposed to be a fix for the referenced issue as to why, if you still plan on this idea, a separate issue for discussion and PR should be made

KTibow commented 5 months ago

While I understand the use for fuzzy search, I don't understand why you think that \W is a "subset" - it quite literally removes everything except for letters, numbers, and underscores. Yes, this PR's way of going about it was wrong, but I'm asking you because if you approve of using \W I'm going to change it to do so.

oSumAtrIX commented 5 months ago

The reason for this is the same as before that this kind of approach is inferior to a fuzzy search. Fuzzy search already incorporates matching strings even when whitespace is left out or included just like your regex. What your regex can't, unlike fuzzy search, and therefore is inferior, is to match a string with a typo or anything other than your whitespace. Therefore your approach is still a subset of what Fuzzy search is. If we formulate the problem query "How to solve finding the patch even with mistakes such as including whitespace accidentally or making a typo" the best and complete solution to that formulation is not your regex, but fuzzy search. If implementation is a question, it should be as simple as using an npm package that would provide you with a fuzzy comparator.

KTibow commented 5 months ago

Okay then, if you're sure that the only way this problem should be solved is with fuzzy search then you can go ahead and make an issue and PR for that as you described.

oSumAtrIX commented 5 months ago

The issue referenced in this PR is primarily solved by removing the sanitization function that was left accidentally in the code. As explained earlier fuzzy search was a follow up on your intention to introduce sanitization as a concept by recycling the function that wasn't supposed to exist to begin with, and not something I initiated. I don't plan on making a PR for this therefore.