Adidea / Gallery-Tools

Tools for scraping image galleries - specifically Furaffinity at this time
GNU General Public License v3.0
2 stars 0 forks source link

New favorites paging format breaks scraper #8

Open Adidea opened 6 years ago

Adidea commented 6 years ago

Problem

The favorites page no longer uses linear page numbers instead using a longer series of numbers that appears to have some chronological meaning, in combination with prev/next in the url, but ultimately not useful for scraping. Really unintuitive all around, but I'm guessing there was some DB restructuring that offered some performance benefits or something.

Observations

As far as I can tell each page has an id, but only works with an offset of prev and next, so a page cannot be directly accessed by it's own id, but that of adjacent pages using next/prev. This means 2 different ids can be used to access the same page depending on next/prev. The first page can only be obtained by not specifying an id or using the id of the 2nd page and prev (provided in the prev button).

To navigate pages the scraper is just going to have to have to grab the url from the next button instead of increment the page number in the url.

Solutions

Specifying a page range is rather unintuitive at this point. I guess the page range could be opened up to 2 text inputs that would require one to specify prev/next at the end of the page id. This would dictate which navigation button the scraper would use to determine the page to stop on.

Alternatively page range could be done away with and instead have 2 options, scrape entire gallery or scrape from start to current page. Personally I've only used the number range to do the latter. This simplifies another issue. Favorites is the only page with the new id system, so I'd have to adapt the interface to cater to both, as it is i'll already have to split the logic up specifically for favorites... which as messy as this code base is it'll only make it messier ;-;. This is almost enough to get me to finish that rewrite. :T

Adidea commented 6 years ago

I wish I could figure out the logic behind how these page ids work. I noticed after a new favorite is added all the ids get scrambled when advancing from the start, but if you go back to a saved id it's the exact page as you left it using the same offset/ids as before the new favorite. Navigating back the first page then to the next puts it back on the new set of ids with each page favorite shifted onto the next page like you would expect when a new favorite is added.

The one nifty thing that might come of this is if you were browsing someones favorites and decided to come back to where you left off another time the pages wouldn't shift with new favorites.

Adidea commented 6 years ago

I think I have an idea for defining the scraper range that should be fairly versatile. A drop down list would be use to select 3 modes:

Adidea commented 6 years ago

Hm.... I can't decide how I want to to tie the 2 different paging systems together in the scraping options. 😕

Adidea commented 6 years ago

Alright, I think for now I'm just going to use https://www.furaffinity.net/controls/favorites/ in place of the regular favorites gallery. I may have a way of dealing with the new favorites page once I add a feature that can identify a part of the gallery based on the submissions found on them. This would be used to define page range since paging itself is unpredictable. It would also be useful for picking back up from the last scrape.