Open turnerjoy opened 3 years ago
Yes I noticed the same thing with the D&D/Magazines folder which instead of going to Home/Books/D&D/Magazines goes to Home/Magazines instead. So I guess I just have to skip these folders to avoid an error, if I can get the skip folder option to work
Having the same problems. Possible quick fix off the top of my head is to check if the current page's url is a substring of the destination page's url. So if the current url is https://thetrove.is/Books/SomeGame/ and the destination page is https://thetrove.is/Books/SomeGame/Subfolder/ it returns true and continues while if the link sends the program to https://thetrove.is/Books/SomeOtherGame/ it returns false and skips it. Main issue is that the links don't send you to the page directly so while the link may say https://thetrove.is/Books/SomeGame/SomeOtherGame/ that page redirects you to https://thetrove.is/Books/SomeOtherGame/ and if I knew enough about webscraping to know how to solve that I wouldn't be here.
Hello @turnerjoy, @beowulf88 and @NexusEye .
Sorry for the delay, had internet issues here this week. I hopefully fixed this with the new release: https://github.com/felipegiacomozzi/the-trove-downloader/releases
The problem wasn't really that the redirect URL was in a different path but there is actually a redirect page that is loaded before:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>TheTrove</title>
<META HTTP-EQUIV="REFRESH" CONTENT="1; URL=https://thetrove.is/Books/Archipelago/">
</head>
<body>
</body>
</html>
You can't really see it in the browser because the redirect is really quick. That is why the program crashed, it couldn't interpret this redirect page. I added support to that and now it extracts the URL and calls it as the next page.
Thanks for the report guys, that really helped me to find this problem.
I've encountered a bit of a problem with this where the program would be infinitely redirecting. I've attached the log file the-trove-downloader.log
There is a lot of cyclic redirection, so I had to remove the redirect navigation: https://github.com/felipegiacomozzi/the-trove-downloader/releases/tag/1.0.15
In this release it will ignore redirects.
It looks like they are now using some linking or "commands" probably to decrease the amount of deduplication.
For Example:
in this folder:
https://thetrove.is/Books//%20Collections//Collaborative%20%26%20Peer%20%26%20Gm-less%20%26%20Shifting%20GM/
The Archipelago folder link appears to go to:
`https://thetrove.is/Books//%20Collections//Collaborative%20%26%20Peer%20%26%20Gm-less%20%26%20Shifting%20GM/Archipelago/
But it actually goes to https://thetrove.is/Books/Archipelago/
There seems to be some straight up bad links also like:
in https://thetrove.is/Tabletop%20Games/BoardGames/Azhanti%20High%20Lightning/
AzhantiHighLightning.jpg goes to a 404