fake-name / xA-Scraper

69 stars 8 forks source link

Fixed Patreon page iteration #111

Open Helios-vmg opened 2 years ago

Helios-vmg commented 2 years ago

This fixes the page iteration on Patreon. The previous mechanism was (for whatever reason) skipping posts seemingly at random. This one:

  1. Stores the request string in a variable. Note that the page[cursor] parameter has been obviated for the first request.
  2. After making the request, the next request string is pulled from the data returned by the API, thus it advances it to the next page. I don't normally work in Python, so I didn't want to make more changes than were necessary to just fix the bug. It may be a good idea to use the presence of the 'next' link to decide whether to continue looping or not, instead of using had_post. Also, the code to strip the 'www.patreon.com/api' is a bit fragile right now. A better solution might be to use a regex to do this.
fake-name commented 2 years ago

Stores the request string in a variable. Note that the page[cursor] parameter has been obviated for the first request.

I can't remember the reason for things being structured the way they are, but I'd bet the "next" parameter is something that post-dates the initial implementation. I'd like to think at least if I saw something like that, I'd use it, rather then the cursor-based mess that's currently there.

Anyways, awesome! I'm glad this is working for someone else! Let me know if you want to make the change discussed above, or I can pull it and do it myself.

Helios-vmg commented 2 years ago

I'd bet the "next" parameter is something that post-dates the initial implementation.

I figured that was the case.