arzkar / ao3-cli

A CLI to download from archiveofourown.org using their built-in download option.
Apache License 2.0
9 stars 0 forks source link

Problems with fics with / in title #2

Closed Kyther closed 1 year ago

Kyther commented 2 years ago

I used this on a file with a list of URLs; it threw a "FileNotFoundError [Errno 2] No such file or directory" whenever it reached one which had a / in the fic name, because it kept the / in the name which meant adding a subdirectory to the file path (I'm running Linux). Running it directly on the URL itself didn't help.

It probably should be coded to ignore all forward slashes in the file names when saving…

Also, I observed that it would hit a point where it would not get a proper response from AO3, and would merrily skip fic after fic after fic with "Fanfiction not found". When I aborted, removed all the successful links from the list, and restarted it, it usually would successfully grab the same fics it had just said weren't there. It would probably be helpful if there were something built in to handle the vagaries of AO3 responses, so that one did not have to monitor the downloading so closely to avoid missing them.

arzkar commented 2 years ago

I used this on a file with a list of URLs; it threw a "FileNotFoundError [Errno 2] No such file or directory" whenever it reached one which had a / in the fic name, because it kept the / in the name which meant adding a subdirectory to the file path (I'm running Linux)

I tested this just now. It seems to work fine. Can you give me an example of the command with full file path that you used which gave you the error.

When I aborted, removed all the successful links from the list, and restarted it, it usually would successfully grab the same fics it had just said weren't there.

When you restart, it should be skipping fics which has already been downloaded. Isn't it doing that? It should work unless u rename the files manually.

Kyther commented 2 years ago

The command I used: ao3_cli -i stranger_things.txt -o ./StrangerThings/ -f PDF

I had a subdirectory called StrangerThings underneath the one I was in that held all the fanfics to download to. It worked beautifully except for those fics with / in the name.

I also tried aoe_cli -u "https://archiveofourown.org/works/35885605" -f PDF That was the first one I ran across that had that issue. It also failed with the same error.

When you restart, it should be skipping fics which has already been downloaded. Isn't it doing that? It should work unless u rename the files manually.

What I mean is, the first time it went to download it, it said "Processing URL", and then underneath that, "Fanfiction not found". Thing was, the fanfic did exist at that link. It would just say "Fanfiction not found" over and over and over for a whole bunch of fics in a row. So those fics were NOT in the download folder already. When I restarted it, those fics successfully downloaded. (Usually. Sometimes I've had to wait a while before trying again because it would have the same error a bunch of times in a row. It's taken a whole lot more monitoring than I expected because of this.)

I've also noticed that sometimes when that error (the "Fanfiction not found") hits, the fic just before getting that message does not download properly, and will end up as a 12 byte pdf file, which I have to remove and re-download. Fortunately it's easy to spot those by size and due to the thumbnail not rendering. (But again, have to pay a lot of attention to that, because it will think it downloaded successfully when it did not.)

Kyther commented 2 years ago

I suspect the issue with the "Fanfiction not found" is due to AO3 giving the "Retry later" message. Does your script have the maximum request frequency required built-in so that this message is triggered less often? I believe it's about once per five seconds…

arzkar commented 2 years ago

The command I used: ao3_cli -i stranger_things.txt -o ./StrangerThings/ -f PDF

Ok, I have fixed it by replacing all occurrences of / with a space since linux doesnt allow / in a filename AFAIK You can test if its working as intended by installing the dev version by- pip install git+https://github.com/arzkar/ao3-cli@main Let me know if there are any other errors.

I suspect the issue with the "Fanfiction not found" is due to AO3 giving the "Retry later" message. Does your script have the maximum request frequency required built-in so that this message is triggered less often? I believe it's about once per five seconds…

Possibly. Can you share the fic url in which you are getting this error. I will try to find a fix.

Kyther commented 2 years ago

Thanks, will use that one.

Possibly. Can you share the fic url in which you are getting this error. I will try to find a fix.

This isn't a situation where a specific URL triggers it at all. It gets triggered after downloading anywhere from 30-50 fics in a row, after which every URL for a good long while will give "Fanfiction not found". Eventually it'll escape that and get back to working (whenever my IP gets out of AO3 jail) but until then, it doesn't matter which URL I try; they'll all fail. (If I attempt to browse to one in my browser at that moment, it'll give me the "Retry later" message, too.)

Another AO3-related script had this about it: "Try to keep your ao3 browsing to a minimum while the script is running. It won't break anything, but it may cause you to hit ao3's limit on how many hits to the site you are allowed within a certain time frame. This limit is per user, or per IP if you are not logged in. If this happens, the script will pause for 5 minutes to let the limit reset, and you may see a "Retry later" message when you try to open an ao3 page during that time. Don't be alarmed by this, just wait it out."

You can find more information on this error in the issue for this script here: https://github.com/radiolarian/AO3Scraper/issues/24 They note it's a 429 error. You'll notice the previous script (whence I copied the info from) said it was designed to pause for five minutes to wait out the error - is that something that could be included in your script?

This script works great for one or two files here and there, but for a large list of them (I'm attempting to grab 20,000+ fics for a friend, the ids of which I grabbed using that AO3Scraper above), it's going to run into AO3's limits on usage over and over.

arzkar commented 2 years ago

"Try to keep your ao3 browsing to a minimum while the script is running. It won't break anything, but it may cause you to hit ao3's limit on how many hits to the site you are allowed within a certain time frame. This limit is per user, or per IP if you are not logged in. If this happens, the script will pause for 5 minutes to let the limit reset, and you may see a "Retry later" message when you try to open an ao3 page during that time. Don't be alarmed by this, just wait it out."

I see. I will try to implement something like this when I get some free time. I usually use the cli to download 10-15 fics in one go so I havent encountered this kind of error. Shouldn't be hard to implement. I will leave this issue open so that I can update on any progress i make regarding the fix,

arzkar commented 1 year ago

Closing this issue as OG issue has been fixed.

I created a new once to keep track of this feature request. Havent been able to get enough free time to look into this yet unfortunately. Too busy with IRL stuff.