Nandaka / PixivUtil2

Download images from Pixiv and more!
http://nandaka.devnull.zone/
BSD 2-Clause "Simplified" License
2.4k stars 254 forks source link

Any Idea of making this multithread? #685

Open 2403772980ygy opened 4 years ago

2403772980ygy commented 4 years ago

It is not hard to guess that the program gets all data and download links in a init process, why not put them in a queue and download them multithreaded?

Nandaka commented 4 years ago
  1. Current UI doesn't support it.
  2. Pixiv might ban you due to unnatural access, see #504
  3. I'm too lazy to do it.
bluerthanever commented 4 years ago

Haha, word, man!

2403772980ygy commented 4 years ago

OK.

  1. the UI is not that hard to modify, actually......
  2. you can control the thread count, also, your internet explorer(whatever it is) download multithreaded.
  3. LOL, I'll do some experiments on my machine and maybe can help.

Anyways, maybe I should do it myself by changing the current download function into a queue push, and add another part to download.

2403772980ygy commented 4 years ago

Also bear with me, for I'm new to this project.

jwshields commented 4 years ago

Thinking out loud here, I was just browsing the issues.. Outside of what Nandaka mentioned- The largest possible problem that I can see is to do with the database. SQLite does not do well with concurrency, and threading/multiprocessing with SQLite can be a recipe for disaster. Some sort of system may need to be implemented to queue & dequeue statements/transactions in a separate thread/process, but then you get into issues with the GIL & sharing objects between the separate Python threads/processes... But if that were to be implemented, it would likely add a decent amount of complexity around how the download processes interact with the DB. I'm trying to think of some ideas real quick, but I can't come up with a simple solution around it.

Nandaka commented 4 years ago

SQLite does not do well with concurrency, and threading/multiprocessing with SQLite can be a recipe for disaster.

I think you need to use lock, it will impact the performance of course. For UI, I think need to use curses, but I'm not sure the support on windows before Win10

2403772980ygy commented 4 years ago

I didn't think of SQLite problems, the lock should only deal with operations outside of downloading and should not do with downloading because it is the main thing we need to accelerate using multithreading, maybe this can help with SQLite problems.

As for curses, I completely agree, but I haven't come up with a decent design yet.

My main concern is that the downloading need some rework to integrate the new UI and multithreading.

Nandaka commented 4 years ago

if you are using curses, the default print will be hidden, right? I have made some changes on the latest commit to allow you to pass a notifier from the caller, so it can handle the messages and show it to the ui. I think multi threading is more useful for batch downloader (option b)

bbappserver commented 4 years ago

Duplicate of https://github.com/Nandaka/PixivUtil2/issues/504

You could stay with the current printline mode, just use the python progress library to indicate how many posts of possible posts have downloaded with a textual progress bar instead of logging each post in detail serially.

This would make things harder to debug though, so I would leave in the existing process post which ties logging to the actual download procedure.

PixivUtil performs no processing that would benefit from multithreading (see also python GIL) except for making parallel requests, so it would suffice to 1)Pull the downloading procedure into the caller, make multiple async 2)Let the caller show progress 3)Perform process post on each result in the caller after the download completes.

But TBH once you no longer have a download backlog, the savings of this are pretty negligable.

2403772980ygy commented 4 years ago

if we let the caller update the process bar and We only need to download one image this looks kinda awkward, I suggest using the downloaded/total size to show the process in the bar.

Nandaka commented 4 years ago

@bbappserver I'm more thinking of the context running multiple jobs in parallel (e.g. downloading mix of by member id or by tags), so inside the thread it just call the appropriate method.

https://github.com/Nandaka/PixivUtil2/blob/master/PixivBatchHandler.py#L157 https://github.com/Nandaka/PixivUtil2/blob/master/PixivBatchHandler.py#L82 https://github.com/Nandaka/PixivUtil2/blob/master/PixivBatchHandler.py#L106

Currently, the UI printout and logging is kind of tied together, using curses won't mess with it assuming the thread id is included in the logging.

As for the notifier, it is up to the dev to include how much details they want to include, currently it just pass start and stop message.