WikiTeam / wikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2024, WikiTeam has preserved more than 600,000 wikis.
https://github.com/WikiTeam
GNU General Public License v3.0
730 stars 151 forks source link

Use `session.get` instead of `requests.get` in `getXMLHeader` #438

Closed Pokechu22 closed 2 years ago

Pokechu22 commented 2 years ago

session.get uses our configured User-Agent, while requests.get uses the default one. Needed for python2 -u dumpgenerator.py --xml --xmlrevisions --images https://fidopedia.fido.de/, as that site rejects the requests user agent.

(That site also requires other stuff; see this branch (perma), though that's not fully complete.)

nemobis commented 2 years ago

This relies on generateXMLDump() and getXMLHeader() actually passing the session variable, otherwise it will fail. Maybe we should handle the default value None here?

Pokechu22 commented 2 years ago

I'm not entirely sure about how the defaults are handled here. getXMLHeader calls getXMLPage which calls getXMLPageCore which directly calls session.post. I'm not really sure why the argument even is optional.