bellingcat / auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).
https://pypi.org/project/auto-archiver/
MIT License
578 stars 60 forks source link

Facebook image archiving #41

Closed djhmateer closed 1 year ago

djhmateer commented 2 years ago

Archiving of image(s) on Facebook is not supported yet and would be very useful.

Placeholder Issue to put in ideas of potentially how it could be done.

Background

https://github.com/djhmateer/auto-archiver#archive-logic has a list of what works and doesn't. Facebook video works using youtube_dlp

In fork above to get a Facebook screenshot I am using using automation to click on the accept cookies page as we don't want the cookie popup in the screenshot.

To get a Facebook post link

"Each Facebook post has a timestamp on the top (it may be something like Just now, 3 mins or Yesterday). This timestamp contains the link to your post. So, to copy it, simply hover your mouse over the timestamp, right click, then copy link address"

Example

As an example of Facebook images which we would like to archive:

https://www.facebook.com/chelseymateerbeautician/posts/pfbid0mhimrwfeBpWKwBUFna28Q3RfaEK8HETcEpk1QXoEeFXHVwaa7oxLxKTHbBqu5nPpl

https://gist.github.com/pcardune/1332911 - potentially this may help.

https://github.com/bellingcat/auto-archiver/issues/26 - @msramalho talked about the potential of https://archive.ph/

msramalho commented 2 years ago

Hi Dave, thanks for opening this discussion.

From #26 we should not use archive.ph since it has captchas and will stop working rather quickly.

My fear is that automation with selenium will lead to the same result as facebook is quite aggressive on that, nonetheless that's the only option still on the table (unless some potentially hacky library is around). If you want to give it a go you can try using the webdriver and using the clicks to get to the content but this would need some experiments to see how quickly facebook detects the automated behaviour and blocks the page. Tbh, this has been the reason we have not tried automating facebook posts :/

djhmateer commented 1 year ago

I've made a fb archiver on my fork of this codebase: https://github.com/djhmateer/auto-archiver/blob/main/auto_archive_fb.py

Have been running in production well (with caveats!).. it's pretty specialised and has to run on its own server.

Maybe close this as an issue, as it is working well enough for me.

msramalho commented 1 year ago

Closing this issue for now since the caveats with facebook archiving usually require passive accounts and proxies. the wacz_archiver_enricher can be used in combination with facebook login credentials to achieve this result.