coleifer / micawber

a small library for extracting rich content from urls
http://micawber.readthedocs.org/
MIT License
635 stars 91 forks source link

Can't get Facebook embeds meta other than Facebook page. #101

Closed anjali-92 closed 3 years ago

anjali-92 commented 3 years ago

The sequence comparison of schema is incorrect. Test url: https://www.facebook.com/wedmegood/videos/796018387640585/

https://www.facebook.com/[^\/\s\?&]+?..... Found a match.... https://www.facebook.com/[^\/\s\?&]+?/videos/[^\/\s\?&]+? ....(This is correct pattern match) https://www.facebook.com/video.php\?id=[^\/\s\?&]+? https://www.facebook.com/video.php\?v=[^\/\s\?&]+? https://www.facebook.com/[^\/\s\?&]+?/posts/[^\/\s\?&]+? https://www.facebook.com/[^\/\s\?&]+?/activity/[^\/\s\?&]+? https://www.facebook.com/[^\/\s\?&]+?/photos/[^\/\s\?&]+? https://www.facebook.com/photo.php\?fbid=[^\/\s\?&]+? https://www.facebook.com/photos/[^\/\s\?&]+? https://www.facebook.com/permalink.php\?story_fbid=[^\/\s\?&]+? https://www.facebook.com/media/set\?set=[^\/\s\?&]+? https://www.facebook.com/questions/[^\/\s\?&]+? https://www.facebook.com/notes/[^\/\s\?&]+?/[^\/\s\?&]+?/[^\/\s\?&]+?

coleifer commented 3 years ago

I'm not clear on your proposed fix, which just is reversing the list of providers. If there is a bug in the translation of the oembed pattern to micawber regex, then let's fix it properly. Otherwise, I'd suggest opening a pull-request on oembed to fix or reorder the facebook pattern list.

anjali-92 commented 3 years ago

I don't see any issue with oembed's providers.json.

There are multiple endpoints only for Facebook provider so this issue might have surfaced.

Screenshot 2021-05-26 at 5 47 00 PM

At line 154 of file providers.py, all the registered patterns get reveresed, the last endpoint schema surface to match which is wildcard for facebook page. I have added the pattern match sequence in first comment.

anjali-92 commented 3 years ago

@coleifer Thankyou :)