aatishb / covidtrends

Tracking the growth of COVID-19 Cases worldwide
https://aatishb.com/covidtrends/
MIT License
301 stars 107 forks source link

lengthy links truncate on social media websites #54

Open sjmackenzie opened 4 years ago

sjmackenzie commented 4 years ago

Problem: An effective way to reduce impact of this virus is to communicate best practices faster than the virus infects. Therefore working short links are critical.

When clicking on "Create shareable link" I'd copy (for example) https://aatishb.com/covidtrends/?location=Hong+Kong&location=Italy&location=Singapore&location=Spain&location=Taiwan Now when pasting on certain websites, specifically facebook, once posted the clickable link results in : https://aatishb.com/covidtrends/?location=Hong+Kong with the rest truncated.

This issue highlights two problems. lengthy links and two the links truncate

This makes it more difficult to communicate quickly when discussing different strategies and the efficacy thereof.

Potential Solution: ideally the link is short thus doesn't demand a lot of space in tweets and cluttering communications. I know you can use a url shortener, but I personally hate those and will very rarely click on them.

aatishb commented 4 years ago

Hmm as of now I feel like this is a facebook bug/issue if they are breaking long URLs. Not sure we should make our URLs more cryptic for this. Willing to reconsider if there's any easy solution that retains URL transparency (i.e. not a link shortener).

waldyrious commented 4 years ago

@aatishb one way to alleviate this situation would be to use country/region codes, such as ISO 3166 or US state abbreviations. I believe that would retain transparency, since these codes are unambiguous and often well-known.

So the link above would become: https://aatishb.com/covidtrends/?location=HK&location=IT&location=SG&location=ES&location=TW.

You could also shorten the parameter names, e.g. locationloc (or even l, though I agree that would be too cryptic). The result would be: https://aatishb.com/covidtrends/?loc=HK&loc=IT&loc=SG&loc=ES&loc=TW.

sjmackenzie commented 4 years ago

https://aatishb.com/covidtrends/?locs=hk+it+sg+es+tw

The above is shorter and not obscure. It also works on facebook.

aatishb commented 4 years ago

Thanks. This is helpful feedback. I like the simplified format @sjmackenzie suggests. If we go this route, we'll need a list of unique abbreviations for every subregion on our selectors. We'd probably also need to pull country codes from JHU, which is not straightforward but apparently doable. And we should try not to break existing URLs.

aatishb commented 4 years ago

Overall I feel this is an interesting idea and it will take a fair bit of work to get it right. I've been avoiding dealing with ISO codes so far because the data doesn't already directly include them. We may eventually need to deal with this. I'm open to thinking more about how to implement this.

sjmackenzie commented 4 years ago

I've edited my previous comment to locs instead of loc so that existing links are not broken.

A possible route is to apply a data transformation on import. Thus the data directly includes ISO codes. The data transformation step should reference this lookup table which is part of the data source.

waldyrious commented 4 years ago

FYI, that table seems to only use national-level (ISO 3166-1) codes, so the subnational regions appear to share the same iso2/iso3 codes (ISO 3166-1 alpha-2 and ISO 3166-1 alpha-3, to be exact), even though they are differentiable by ISO 3166-2. For example, the various regions of China all use CN/CHN, but say, Hubei does has a unique code, CN-HB. So we'd need to augment that table with the subdivision data (e.g. here's one).

sjmackenzie commented 4 years ago

How about something a bit more dirty, but far simpler: https://aatishb.com/covidtrends/?locs=344+380+702+724+158+15613

UID: 344 hong kong 
UID: 380 italy
UID: 702 singapore
UID: 724 spain
UID: 158 taiwan
UID: 15613 hubei

This allows us to compare states/provinces/countries against other states/provinces/countries.

IMHO I don't really care about an intelligible links, just a short (enough) link that works on every platform (no matter how buggy they are) enabling conversations on strategies. It would be good to have something that doesn't mess with the data too much.

waldyrious commented 4 years ago

I think it's quite doable to host a full look-up table in this repo, based on the ones linked above. It only needs to be done once, and it's not a lot of work either since most of the content can be generated automatically. I'm willing to contribute such a table if this is the approach that we decide to move ahead with.

On a separate note, the short URL (using a single locs parameter) would probably be more readable with commas instead of a plus sign, especially since some of the codes may have dashes in them:

https://aatishb.com/covidtrends/?locs=hk,it,sg,es,tw

(It also looks more like a list, which is another plus.)

sjmackenzie commented 4 years ago

This looks great to me, though probably best to test whatever chosen separator character on facebook twitter and other big ones. I assume you want ISO 3166-2?

aatishb commented 4 years ago

I'm in favor of shortening the lengthy links (in a way that doesn't break existing behavior). The location URL was a hastily added feature the first time around, so I agree it's a good idea to revisit and do it right.

My concern with a manual lookup table is that it will break if JHU changes how they name countries. Perhaps they are not still doing this, but a while back the country names were changing regularly. Also do they update their lookup table when they add a new country to the dataset? Or is it already comprehensive?

Is there some standard for what separator we should use in these kinds of URL strings (+ or ,)?

MrSpiffyClean commented 4 years ago

Since I recently dealt with the whole URL stuff, might as well give my two cents here. Quick summary:

From the top, the truncated links issue. Having only Facebook to test (as that was the only site specifically mentioned), I can indeed confirm and replicate partially what happens. Pasting a queried URL and then posting it yields the following results:

Trying out other regions (e.g. Canada) I end up finding that the main issue is that Facebook only detects the last location parameter present. I think this is an issue with Facebook's scrapper, as checking the page with their tools shows that it just doesn't look for the exact address given. As far as I know, having multiple parameters with the same key isn't forbidden, and tools such as Python's urllib.parse and even URLSearchParams seem to have means of reporting every parameter. I might raise a bug with Facebook about this, given time. There's also another issue in that long links are ellipsed, making it even harder for someone to check the actual URL.

I would also like to know what other (specific) sites are truncating links like this or if this is just a Facebook issue. Granted, Facebook is where most people end sharing stuff, so this needs to be looked into regardless.

As for shorter links, I'm perfectly fine with all the suggestions given, being more partial to @waldyrious latest idea. I don't enjoy having numeric country codes (as @sjmackenzie suggested) as they aren't as clear and make understanding the URL at a glance harder, although if the JHU data doesn't change the countries' names/UIDs, it would be the easiest to do.

From what I checked, there isn't a consistent standard regarding the separators (as long as reserved characters aren't used). I would support using regular commas, as plus signs are the URL encoding of spaces, though it's pretty much the same, I guess. (On the current URL scheme) Replacing the & with ; could work (as Facebook doesn't seem to complain), but it isn't very usual nowadays (Python seems to understand they are different parameters, but JS not).

From what I understand, this is mostly a maintenance issue: having to check whether JHU breaks whatever convention they set up (if we use our own lookup table, or they don't update theirs) and supporting older links.