PeskyPotato / archive-chan

Download threads from 4chan including media
MIT License
24 stars 6 forks source link

{BUG} g ..... or any board name is invalid #6

Closed baraa272 closed 3 years ago

baraa272 commented 3 years ago

Capturezzzzzzzzzzzzzzzzzzzzzzz as you see the issue also sometimes it gives me "reply" is undefined and the script wont save html because of it so please can you look at this

cardoso-neto commented 3 years ago

You were using the wrong format, @baraa272. You shouldn't use the whole boards.4chan.org/g/ URL when trying to download all board threads. Just the string between the slashes, e.g., b or pol.

The last command you issued is the only one that's correct. python archiver.py g -p -v will save every active thread. That "Invalid request: g" is just an accidental message. If you wait a bit it'll start downloading the threads.


$ python archiver.py g -p -v --use_db
Invalid request: g
Downloading thread: 79587404
Downloading thread: 79583810
Downloading thread: 76759434
Downloading thread: 79586994
Downloading image: https://i.4cdn.org/g/1610118225680.jpg g.jpg
Downloading post: 79587404 posted on 01/08/21(Fri)10:03:45
Downloading image: https://i.4cdn.org/g/1610095625707.jpg 21-11-2020.1.jpg
Downloading reply: 79587552 replied on 01/08/21(Fri)10:14:56
Downloading reply: 79587570 replied on 01/08/21(Fri)10:16:10
Downloading reply: 79587592 replied on 01/08/21(Fri)10:17:35```
cardoso-neto commented 3 years ago

Ok, I think I found the source of the confusion: image

We need to remove that else statement on line 134. Because it's the else of the previous if and whenever that previous if evaluates to false it runs the else and that is not the desired behavior.

But it has got nothing to do with the "bug" you mentioned. It's just a spurious print statement introduced by the latest pull request.

cardoso-neto commented 3 years ago

@LameLemon Since you're going to edit this part of the code, maybe you could find some inspiration on a commit of mine that dealt with this: https://github.com/cardoso-neto/archive-chan/commit/f318a58c84d423124059db2d51e75b7aed3c5c18

My thoughts were on making the board URLs extraction process happen inside the 4chan API class since it is specific to 4chan.

baraa272 commented 3 years ago

cant wait to the new COMMIT to be pushed as soon as possible also thanks for effort man and thanks for author for this non hassle tool i have a question though how to open chan.db file and view the DB???

baraa272 commented 3 years ago

VirtualBox_windows 10_09_01_2021_03_58_37 well , i downloaded your fork and tried after installing pip install superjson but still cant download boards as i face this error message in picture also again thanks for all effort <3

PeskyPotato commented 3 years ago

@baraa272

how to open chan.db file and view the DB???

The database is written in sqlite3, you can use a tool sqlitebrowser which gives you a GUI interface to interact with the database, there's also a CLI tool.

@cardoso-neto I'll work on moving over the board extraction today, muito obrigado!

PeskyPotato commented 3 years ago

This issue has been addressed in efd3f22abfa22fc944dfd944c01a485ecfb6884f.

The code base is a little more cleaner now.

: )

baraa272 commented 3 years ago

cant wait to use , also thanks for all effort

cardoso-neto commented 3 years ago

@baraa272 My fork is not really intended for the general public to use yet. However, the master branch is working, and you just have to pass it the flag --new_logic. I've been refactoring the code non-stop, as it's still alpha software, so things will change quickly and without warning.

Sneak peek:

$ python archiver.py logs/list.txt --verbose --preserve_media --path test --new_logic

Load from 'test/w/2131136/thread.json' ...
    Complete! Elapse 0.001323 sec.

Load from 'test/w/2180395/thread.json' ...
    Complete! Elapse 0.001179 sec.

Load from 'test/w/2180136/thread.json' ...
    Complete! Elapse 0.004443 sec.
All available media has been downloaded.
All available media has been downloaded.
All available media has been downloaded.
Time elapsed: 6.3776s

logs/list.txt

https://boards.4channel.org/w/thread/2131136/miss-kobayashis-dragon-maid
https://boards.4channel.org/w/thread/2180395/ector-thread-requests-sharing-no-anime-girl-as-op
https://boards.4channel.org/w/thread/2180136/new-desktop-thread