Smart123s / ItchClaim

Automatically claim free games from itch.io
https://itchclaim.tmbpeter.com
MIT License
51 stars 1 forks source link

Missing active sales #13

Open theitchtime opened 1 month ago

theitchtime commented 1 month ago

Noticed a whole bunch not in the active sales list

https://itch.io/s/86526/bundle-sale https://itch.io/s/88459/year-of-the-dragon https://itch.io/s/102043/free-to-get https://itch.io/s/108977/buttworm-ate-the-price-d https://itch.io/s/110941/free-gift https://itch.io/s/115108/everything-is-free https://itch.io/s/116899/sale-symbio-games https://itch.io/s/117435/support-me- https://itch.io/s/122261/aquapop-sale https://itch.io/s/122579/spring-summer-sale https://itch.io/s/124458/a-very-very-very-goooooooooooooood-sale https://itch.io/s/125390/summer-sale https://itch.io/s/127091/summer-sale https://itch.io/s/127385/claudia-for-free-summer-2024 https://itch.io/s/128902/vacation-sale-100 https://itch.io/s/129069/king-boo-sale https://itch.io/s/129092/for-free

Note that 129092 seems to be a short ?1-day? sale. Also wish there was easier way to scrape all these unknown finds (and what else I missed out).

But I love your program! Incredibly helpful.

EDIT: I've been randomly clicking a bunch of sales and scanning for 100% tags, which is how I found most of these new ones.

EDIT2: https://itch.io/s/59746/free https://itch.io/s/107895/old-game-sale https://itch.io/s/115229/2024-free https://itch.io/s/115331/play-now https://itch.io/s/116859/curtain-play-it https://itch.io/s/122432/oceanox-development-sale https://itch.io/s/122615/gone-out-of-business-sale https://itch.io/s/127260/bedrotting https://itch.io/s/127938/amareica-going-for-free-until-end-of-year

EDIT3: Another thing I noticed is that sales sometimes show up on the feeds. https://itch.io/games/newest/on-sale.xml?page=1

https://itch.io/feed/sales.xml New sales was blank for 2024-07-19 but 59028 appeared for 2024-07-20.

But I've yet to figure out a master list for all those missing entries listed above.

Smart123s commented 1 month ago

Wow, that's a huge list, thanks! I've manually refreshed those sales in the list. Sorry for the late response, I didn't have much time on hand. I'll look into why those weren't cached as soon as I can. Also thanks for the hint on how you found them. I check sale pages such as https://itch.io/games/on-sale and even https://itch.io/tools/on-sale. I'll check if those XML have information in addition to those pages. Any other methods you have found sales? Everything would be useful.

theitchtime commented 1 month ago

I haven't figured out a reliable method yet.

Just been manually scrubbing the sales, which I think your scanner is pretty reliable.

https://itch.io/games/newest/on-sale?page=1
https://itch.io/tools/newest/on-sale?page=1
https://itch.io/game-assets/newest/on-sale?page=1
https://itch.io/comics/newest/on-sale?page=1
https://itch.io/books/newest/on-sale?page=1
https://itch.io/physical-games/newest/on-sale?page=1
https://itch.io/soundtracks/newest/on-sale?page=1
https://itch.io/game-mods/newest/on-sale?page=1
https://itch.io/misc/newest/on-sale?page=1

The rest I've been randomly entering sale id #s and then looking for 100% off tags. Surprisingly I picked up a lot of "hidden" things this way, but had to manually do this over maybe 100-ish pages.

I noticed that BRUD sale just started but it's not showing on the sales.xml feed. So meh. http://itchclaim.tmbpeter.com/data/2594408.json

Also want to report that some items not having a "claim" page are still claimable. There's usually no download links but good for future-proofing. https://flagimtoshi.itch.io/claudia-plus https://kool-games-dev-jarmele.itch.io/phineas-and-ferb-original (enough others)

Thank you for adding all those links btw! This tool is exceptionally amazing. :)

Smart123s commented 1 month ago

The script actually scrapes the exact same URLs, just in JSON format: https://github.com/Smart123s/ItchClaim/blob/be8880fcb7a4f729ff65ecba1c0fb736169dadc2/ItchClaim/DiskManager.py#L164 https://github.com/Smart123s/ItchClaim/blob/be8880fcb7a4f729ff65ecba1c0fb736169dadc2/ItchClaim/__main__.py#L87-L90 And if you have noticed, the sale URLs have an incremental ID. So the script also goes through those. It starts where it last left off, and goes until it gets 404 errors.

The https://itch.io/games/newest/on-sale.xml?page=1 URL seems to be the same as the one without the .xml extension. And https://itch.io/feed/sales.xml doesn't seem to be too useful. It has only one item in to right now: https://web.archive.org/web/20240723205609/https://itch.io/feed/sales.xml

Smart123s commented 1 month ago

If you notice any more missing sales, please let me know here. And if you could, I would greatly appreciate a link to where you have found it. (Maybe with an archive.org link too, in case the list's content changes?)

theitchtime commented 1 month ago
See area below for Wendigo 100%
http://web.archive.org/web/20240724204926/https://itch.io/s/87970/find-yourself
==>
https://itch.io/s/129248/wendigo-sale

And this one by chance. https://itch.io/s/121220/

Smart123s commented 1 month ago

Thanks! I think I have found the root cause of the problem.

Seems like at job https://github.com/Smart123s/ItchClaim/actions/runs/9258985901/job/25470049749 (archived logs), itch.io was down for 5+ hours, so the script continued to scrape sales. Then itch.io came back online, the script got a 404 error, which is a valid stop condition, so it stopped, and noted 129906 as the last checked sale. I have started a recheck of all sales after 123964. That's where the buggy run started. I've pushed an update (https://github.com/Smart123s/ItchClaim/commit/24ba668db6d471e2e28b677348614e998e57ef6e), so that if a network error happens again, it'll abort the running refresh immediately.

Smart123s commented 1 month ago

https://itch.io/s/121220/ didn't get cached, because it doesn't have a price. Will be curious to see what happens to this game on Friday, when the sale ends.