WikiTeam / wikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2023, WikiTeam has preserved more than 350,000 wikis.
https://github.com/WikiTeam
GNU General Public License v3.0
705 stars 147 forks source link

Use `session.get` instead of `requests.get` in `getXMLHeader` #438

Closed Pokechu22 closed 1 year ago

Pokechu22 commented 1 year ago

session.get uses our configured User-Agent, while requests.get uses the default one. Needed for python2 -u dumpgenerator.py --xml --xmlrevisions --images https://fidopedia.fido.de/, as that site rejects the requests user agent.

(That site also requires other stuff; see this branch (perma), though that's not fully complete.)

nemobis commented 1 year ago

This relies on generateXMLDump() and getXMLHeader() actually passing the session variable, otherwise it will fail. Maybe we should handle the default value None here?

Pokechu22 commented 1 year ago

I'm not entirely sure about how the defaults are handled here. getXMLHeader calls getXMLPage which calls getXMLPageCore which directly calls session.post. I'm not really sure why the argument even is optional.