eight04 / ComicCrawler

An image crawler written in Python.
267 stars 47 forks source link

[suggestion] zip & delete original file #23

Closed kuanyui closed 7 years ago

eight04 commented 8 years ago

I think using 3rd-party program is much better than writing in python ourselves. It is possible to call 7-zip or other archiver with runafterdownload setting.

kuanyui commented 8 years ago

This function is very important for me, because zip file can be read by nearly all comic readers (e.g. Comix) ,which can jump between chapters/books/episodes files easily and conveniently. Also zip save disk space, efficient for copy.

On Linux, I call zip directly and easily, but I have no programming experience on Windows and don't know how to do this. Sorry cannot help.

starobots commented 7 years ago

just write a *.bat, that's easy to zip files.

for /d %%X in (*) do "D:\7-zip\7z.exe" a "%%X.zip" "%%X\"

kuanyui commented 7 years ago

How about this https://docs.python.org/3/library/zipfile.html ? For cross-platform concerned.

eight04 commented 7 years ago

I can write a .py script to compress folder if you need one.

Actually, I had thought if we can make Comic Crawler support archive completely, which means, to read/write images from/to archives instead of the file system. But it is not a necessary feature for me.

kuanyui commented 7 years ago

I can write a .py script to compress folder if you need one.

Don't you want integrate auto-compress & delete-original-file features into ComicCrawler? After parsing and getting book/chapters list, check if directory + image files OR zipped file exist by their filenames.

Actually, I had thought if we can make Comic Crawler support archive completely, which means, to read/write images from/to archives instead of the file system. But it is not a necessary feature for me.

I also feel that sounds a unnecessary feature.

eight04 commented 7 years ago

Don't you want integrate auto-compress & delete-original-file features into ComicCrawler?

While this function can be done in one line, I won't try to re-implement it.

Also duplicate test is not only depending on the file name but the checksum.

kuanyui commented 7 years ago

Don't you want integrate auto-compress & delete-original-file features into ComicCrawler?

While this function can be done in one line, I won't try to re-implement it.

You mean run a script manually to zip? I exactly know, I also do like that after downloading all files with ComicCrawler currently. (e.g. for x in *;do zip ${x}.zip x;done) But that's still too annoying.

In my usage case, I need the outputted zip files only, and delete all the original directory & files to save spaces and optimize afterward transferring speed. However, deleting original directories & image files means ComicCrawler don't know what has been downloaded anymore.

Also duplicate test is not only depending on the file name but the checksum.

How about a mechanism like following pseudo code:

# After getting book/chapters title & download links
if (os.path.exist(chapter_name + ".zip")):
    pass    # Assume the file has been fully downloaded, so just ignore it.
else:
    # Current ComicCrawler's (download + check filename + checksume) mechanism
    # ......
    # After a directory fully downloaded
    if mod.config.get("zip_after_download") == true:
        zip_file(chapter_dir)
        if mod.config.get("delete_original_after_zip") == true:
            shutil.rmtree(chapter_dir)
eight04 commented 7 years ago

Do you use CLI instead of GUI? By using GUI, Comic Crawler saves mission state in a json file (~/comiccrawler/pool/). There is no need to analyze file trees.

kuanyui commented 7 years ago

Oops...Yes, I run ComicCrawler on a NAS via ssh.

eight04 commented 7 years ago

I don't use NAS. Can NAS run the GUI version?

kuanyui commented 7 years ago

Yes (it's just a VM running Ubuntu), but that means its hard to access via SSH.

Hmm... I think I should implement this feature, by making a wrapper of ComicCrawler. I'm planning to make ComicCrawler as a simple private web service, which is conveniently accessible for my Andriod devices.

eight04 commented 7 years ago

I know that some download manager can launch a RPC server like aria2, It shouldn't be too hard by wrapping comiccrawler.download_manager like comiccrawler.gui does.

Or maybe it is easier to just make CLI launch mission_manager and use the saved data from it.