Skallwar / suckit

Suck the InTernet
Apache License 2.0
728 stars 40 forks source link

Not a directory (os error 20) - Error while cloning wiki.raregamingdump.ca #238

Open 123jimenez99 opened 2 months ago

123jimenez99 commented 2 months ago

Hey everyone, I encountered an issue while attempting to clone the wiki.raregamingdump.ca website for archiving purposes. Here's the error message I got:

2024-05-24 22:07:54.660954601 +00:00: [ERROR] Couldn't create wiki.raregamingdump.ca/index.php/other.7z: Is a directory (os error 21)
thread '<unnamed>' panicked at src/logger.rs:42:9:
Couldn't create wiki.raregamingdump.ca/index.php/other.7z: Is a directory (os error 21)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at src/scraper.rs:182:57:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }
thread '<unnamed>' panicked at src/scraper.rs:182:57:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }
thread '<unnamed>' panicked at src/scraper.rs:182:57:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }
thread '<unnamed>' panicked at src/scraper.rs:182:57:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }
thread '<unnamed>' panicked at src/scraper.rs:182:57:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }
thread '<unnamed>' panicked at src/scraper.rs:182:57:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }

This this is the command I used: suckit https://wiki.raregamingdump.ca -v -t5 -c -j 64

Any insights or suggestions on how to resolve this would be greatly appreciated.

Cheers!

Skallwar commented 2 months ago

I think the issue occurs when you first download a.com/b/c.html and then you tried to download a.com/b because a.com/b has been created as a folder an now suckit is trying to write a file with the same name. I think there is an issue with the function that create a path from a given url

Here IIRC.

I'm quite busy at the moment so I would appreciate any help on this

123jimenez99 commented 2 months ago

I'm afraid I can't help in that regard as I have no programming knowledge. In the end I managed to create a complete archive using the Browsertrix-Crawler Docker container. In any case, thanks for your support and I hope the best for your project!

Skallwar commented 2 months ago

I will leave it open as the issue is still there. Thanks for the feedback :)