Twin-Cities-Mutual-Aid / twin-cities-aid-distribution-locations

A webapp to coordinate aid and care in the Twin Cities.
44 stars 31 forks source link

Links in Google Sheet renders incorrectly on map #169

Closed emyl3 closed 4 years ago

emyl3 commented 4 years ago

The current implementation of creating <a> links on the map is causing an issue when rendering out Google sheet entries that have a clickable link in the cell.

An example is the "Notes" section of "The Sheridan Story".

Screen Shot 2020-06-13 at 8 48 51 PM

My bad for introducing this bug in PR 168

emyl3 commented 4 years ago

I am wondering if this is just limited to this particular entry for "The Sheridan Story" because all other inline links within Google Sheets just don't render at all since we aren't getting the HTML data from google sheets (i believe) 🤔

codarose commented 4 years ago

Update to this issue:

There is still an issue with the regex for links, because currently if a URL ends in a comma or other punctuation, i.e. "http://www.google.com," the comma gets sucked into the link and it fails.

Also, if someone writes "www.google.com" the link doesn't work, so we should probably inject "http://" into the href to make things less complex on the data entry side?

These are just two ideas, but if someone has a more robust and inclusive regex that they favor for URLs, if we could just make sure it is well-tested it that would be ideal!

emyl3 commented 4 years ago

@estherrose046 Thank you for pointing that out! 😸

Maybe the regex could avoid capturing punctuations (e.g. periods, commas, semicolons, etc..) after the final . of the url?

I was suppose to write some unit tests for the regex but haven't had the time to start it this week. I'll created an issue for unit testing the regex so that I don't forget about it or if someone has the capacity to get to it before I do -- they can totally do so. 😅

TimothyMDean commented 4 years ago

@emyl3 It looks like the Regex was taken from one specific answer to the SO question you linked to in the code. The comments on that answer specifically mention a problem with trailing punctuation marks, and the regex’s author doesn’t provide a good answer to that problem.

Getting a regex to match all URLs is a notoriously hard problem to get right, which is why there are so many different SO answers out there. But following the question you linked to in the code, I found that it referenced this question, and the highest-upvoted answer there was this one. This answer is provided using PHP syntax but could easily be converted to JavaScript.

Does it make sense to try this alternative regex and see if it works for more use cases? Do we have a set of links we should be testing against to make sure it works as it should?