fuddl / wd

a browser extension for wikidata
https://wikidata.org/wiki/Wikidata:Tools/Wikidata_for_Firefox
GNU General Public License v3.0
36 stars 6 forks source link

Website resolver fuzzy matching doesn't work on HTTPS #94

Closed derenrich closed 1 year ago

derenrich commented 1 year ago

Look for example at https://www.wikidata.org/wiki/Q10784335 and the website https://www.hfl.jp/

It should fuzzily match the two together but it doesn't

fuddl commented 1 year ago

In this case http://www.hfl.jp always redirects to https://www.hfl.jp. So one could argue that http://www.hfl.jp is a legacy value and https://www.hfl.jp is the most up-to-date value. So makeing a new statement would improve the item. Don't you agree?

derenrich commented 1 year ago

Sure but I still think the fuzzy matching should catch this. If I did update the statement I would have to wait for cache invalidation (or SPARQL-latency). So either way I get a bad user experience.

fuddl commented 1 year ago

If I did update the statement I would have to wait for cache invalidation (or SPARQL-latency).

Does this issue still occour for you in version .273?

So either way I get a bad user experience.

I see. Imagine this user experience scenario:

  1. You somehow arrive at https://www.hfl.jp
  2. The sidebar shows Q10784335
  3. …along with a message that asks you to confirm that this is the correct item for this website.
    1. confirming causes the item to receive a new website statement containing the https url.
    2. dismissing displays the regular matching interface.

This could achieve both an enriched user experience and an improvement for wikidata.

derenrich commented 1 year ago

Even in 0.273 there is lag where I add a new URL and it doesn't immediately recognize the website when I go to it. Given the replication delay in SPARQL I don't see how that can be resolved.

That described UX would be fine. Though I'm not sure if it should replace or add a new URL.

fuddl commented 1 year ago

Even in 0.273 there is lag where I add a new URL and it doesn't immediately recognize the website when I go to it. Given the replication delay in SPARQL I don't see how that can be resolved.

Well, I resolved it with an internal cache and it works for me:

https://user-images.githubusercontent.com/842548/194484178-05670f0b-13c0-4e46-8744-6f388c868902.mov

Does it not work for you? Or is the your scenario different?

That described UX would be fine. Though I'm not sure if it should replace or add a new URL.

Since at some point in time http url was the most accurate value, I'd say both values should be present, one should be preferred. What do you think?

derenrich commented 1 year ago

I'm not sure why my experience differs. I've several times manually updated the official website (to change it to HTTPS or similar) and then refreshed the website only to find that it did not auto-link. Maybe trailing slashes are the problem? Or maybe it's because of how you implemented the cache (with a special case for linking within the UI?)

Since at some point in time http url was the most accurate value, I'd say both values should be present, one should be preferred. What do you think?

In general I agree. For URLs with such a small change I'm not sure if it's worth the clutter.

fuddl commented 1 year ago

I've several times manually updated the official website (to change it to HTTPS or similar) and then refreshed the website only to find that it did not auto-link.

If you perform the edit outside of the extension interface, I cannot track the change. So that seems to be the problem.

For URLs with such a small change I'm not sure if it's worth the clutter.

I think the backwards compatibillity is worth some clutter but your mileage may vary 🤷.

My thinking was, that a document with a with a different protocol in its url is technically another document. But so is a document with and without a trailing slash. But I guess in reality this technical difference has never been used to actually point to a different document.

While I'm at it, most websites don't distinguish between http://example.com/ and http://www.example.com/ should the fuzzy matching also cover the www?

These are at worst 4×2×2×2=32 combinations. I think that would be acceptable.

derenrich commented 1 year ago

Yeah I think also fuzzing on www would be smart. I'm considering running a bot to update official website to HTTPS across wikidata.

If you perform the edit outside of the extension interface, I cannot track the change. So that seems to be the problem.

Yeah it wasn't clear to me that I had to modify it in the extension. It does work if I do that. I'd rather not have to do that but I guess it's ok.

fuddl commented 1 year ago

@derenrich please update to version .276 and check out these websites:

btw I also added official blog

derenrich commented 1 year ago

yup it's working now! thanks so much for doing this!