MarshalX / atproto

The AT Protocol (🦋 Bluesky) SDK for Python 🐍
https://atproto.blue
MIT License
425 stars 45 forks source link

Misdetection of URLs / links in the "auto_huperlinks" example #141

Closed fxcoudert closed 10 months ago

fxcoudert commented 1 year ago

The regexp in extract_url_byte_positions at https://github.com/MarshalX/atproto/blob/main/examples/advanced_usage/auto_hyperlinks.py does not appear to detect all valid URLs. Take for example:

https://www.cell.com/matter/fulltext/S2590-2385(23)00409-5?rss=yes

This is misdetected, and the URL is stopped before (

MarshalX commented 1 year ago

Could you pls fix?

fxcoudert commented 1 year ago

Not really. I've added \(\) to the allowed characters in my own use case, but I'm pretty sure the regexp is not conformant and will fail to catch other valid URLs. Probably better to use something designed and tested by someone else.

MarshalX commented 1 year ago

@Jxck-S fyi