glefait / socialregexes

Given urls, returns identified social accounts
MIT License
4 stars 1 forks source link

Add support for social content URLs #4

Open jayvdb opened 4 years ago

jayvdb commented 4 years ago

Currently the re only support links to social accounts, and they reject links to social content. i.e. https://github.com/glefait is understood, but https://github.com/glefait/socialregexes is not. But more relevant is https://twitter.com/jayvdb is ok, but https://twitter.com/jayvdb/status/1057567621531426818 is not.

The regex often use $ to prevent extra parts of the URL.

If the caller can control whether 'other junk' can appear after the user identifier, the social account can be extracted from the URL.

socialregexes would then append the $ programmatically to the regex depending on the callers needs.

This new flag would default to disabled so there is no change to the API responses unless the caller explicitly opt-in for the looser rule.

jayvdb commented 4 years ago

Another aspect worth considering is capturing and returning the "other junk".

This could be achieved by using a special tuple class which only iterates over two items, but includes a hidden attribute for "other junk".

Or, a new function could be used to support the new return type which also provides the "other junk".

glefait commented 4 years ago

It makes sense to be able to retrieve a user_id from a url even if the url describe something related to the user and not specifically the user profile. However, I think it's harder to avoid false positive on that part, so I would prefer to clearly separate those functions.

jayvdb commented 4 years ago

Ya, very likely true. I'll get a PR together.