Closed djhmateer closed 1 year ago
Hi Dave, thanks for opening this discussion.
From #26 we should not use archive.ph since it has captchas and will stop working rather quickly.
My fear is that automation with selenium will lead to the same result as facebook is quite aggressive on that, nonetheless that's the only option still on the table (unless some potentially hacky library is around). If you want to give it a go you can try using the webdriver and using the clicks to get to the content but this would need some experiments to see how quickly facebook detects the automated behaviour and blocks the page. Tbh, this has been the reason we have not tried automating facebook posts :/
I've made a fb archiver on my fork of this codebase: https://github.com/djhmateer/auto-archiver/blob/main/auto_archive_fb.py
Have been running in production well (with caveats!).. it's pretty specialised and has to run on its own server.
Maybe close this as an issue, as it is working well enough for me.
Closing this issue for now since the caveats with facebook archiving usually require passive accounts and proxies. the wacz_archiver_enricher can be used in combination with facebook login credentials to achieve this result.
Archiving of image(s) on Facebook is not supported yet and would be very useful.
Placeholder Issue to put in ideas of potentially how it could be done.
Background
https://github.com/djhmateer/auto-archiver#archive-logic has a list of what works and doesn't. Facebook video works using youtube_dlp
In fork above to get a Facebook screenshot I am using using automation to click on the accept cookies page as we don't want the cookie popup in the screenshot.
To get a Facebook post link
"Each Facebook post has a timestamp on the top (it may be something like Just now, 3 mins or Yesterday). This timestamp contains the link to your post. So, to copy it, simply hover your mouse over the timestamp, right click, then copy link address"
Example
As an example of Facebook images which we would like to archive:
https://www.facebook.com/chelseymateerbeautician/posts/pfbid0mhimrwfeBpWKwBUFna28Q3RfaEK8HETcEpk1QXoEeFXHVwaa7oxLxKTHbBqu5nPpl
https://gist.github.com/pcardune/1332911 - potentially this may help.
https://github.com/bellingcat/auto-archiver/issues/26 - @msramalho talked about the potential of https://archive.ph/