Open djhmateer opened 8 months ago
Looking forward to that PR, we can indeed have an option to run a specific archiver via a residential IP proxy.
Taking another look at this, can you clarify if you're doing any extra downloads/requests or simply parsing data form inside the wacz?
Hi Miguel
From:
Probably best to follow along on link above.
Apart from the /photo special case, I get the root page, then parse it for resources, getting the fb_id and set_id. Then jump down to
which does another request (and another wacz download), then returns the next fb_id back to the main function above.
Regards Dave
I've got a Facebook archiver working by using the
wacz_enricher.py
https://github.com/djhmateer/auto-archiver/blob/v6-test/src/auto_archiver/enrichers/wacz_enricher.py#L159
Am using a stored profile to be able to get images which require you to be logged in.
Am running this archiver from a residential IP as if run from a cloud, then FB will block the requests.
This archiver is run as well as the main archiver (which runs on a cloud)
It may be that this can be much simpler if I can run everything sequentially (and not on 2 servers)., Need to wait for more bandwidth on residential network, then can potentially do a PR.
Also I've found I need to keep testing the profile as it will need to be re-logged in after a few weeks.