coffeebank / coffee-maubot

Matrix bot plugins for Maubot 🍢 Add link previews, choose, and other tools to your Matrix chat ☕
https://coffeebank.github.io/coffee-maubot
11 stars 5 forks source link

urlpreview: add html_custom_headers #25

Closed rom4nik closed 10 months ago

rom4nik commented 11 months ago

Adds config option to set custom headers for HTML parser.

If the headers are not set ({}), default aiohttp headers are used, so e.g. the user agent was Python/3.11 aiohttp/3.8.5.

In my case, I need to use { Accept-Encoding: 'deflate, gzip, br, zstd', User-Agent: 'WhatsApp/2' } to get og:* tags instead of 403 and a captcha on links to https://allegro.pl. The user agent was borrowed from Signal, where previews are generated on client side and work as intended: https://github.com/signalapp/Signal-Android/issues/9958

matrix_get_image and check_image_content_type still use the default headers. That's ok for the few websites I've checked, but do you think those functions should use custom headers, same as HTML parser?

coffeebank commented 10 months ago

Thanks for the PR! Works great, and a much needed enhancement - LGTM.

matrix_get_image and check_image_content_type headers could be nice for consistency, will add for v0.3.4