Open 2403772980ygy opened 4 years ago
Haha, word, man!
OK.
Anyways, maybe I should do it myself by changing the current download function into a queue push, and add another part to download.
Also bear with me, for I'm new to this project.
Thinking out loud here, I was just browsing the issues.. Outside of what Nandaka mentioned- The largest possible problem that I can see is to do with the database. SQLite does not do well with concurrency, and threading/multiprocessing with SQLite can be a recipe for disaster. Some sort of system may need to be implemented to queue & dequeue statements/transactions in a separate thread/process, but then you get into issues with the GIL & sharing objects between the separate Python threads/processes... But if that were to be implemented, it would likely add a decent amount of complexity around how the download processes interact with the DB. I'm trying to think of some ideas real quick, but I can't come up with a simple solution around it.
SQLite does not do well with concurrency, and threading/multiprocessing with SQLite can be a recipe for disaster.
I think you need to use lock, it will impact the performance of course.
For UI, I think need to use curses
, but I'm not sure the support on windows before Win10
I didn't think of SQLite problems, the lock should only deal with operations outside of downloading and should not do with downloading because it is the main thing we need to accelerate using multithreading, maybe this can help with SQLite problems.
As for curses, I completely agree, but I haven't come up with a decent design yet.
My main concern is that the downloading need some rework to integrate the new UI and multithreading.
if you are using curses
, the default print will be hidden, right? I have made some changes on the latest commit to allow you to pass a notifier from the caller, so it can handle the messages and show it to the ui. I think multi threading is more useful for batch downloader (option b
)
Duplicate of https://github.com/Nandaka/PixivUtil2/issues/504
You could stay with the current printline mode, just use the python progress
library to indicate how many posts of possible posts have downloaded with a textual progress bar instead of logging each post in detail serially.
This would make things harder to debug though, so I would leave in the existing process post
which ties logging to the actual download procedure.
PixivUtil performs no processing that would benefit from multithreading (see also python GIL) except for making parallel requests, so it would suffice to 1)Pull the downloading procedure into the caller, make multiple async 2)Let the caller show progress 3)Perform process post on each result in the caller after the download completes.
But TBH once you no longer have a download backlog, the savings of this are pretty negligable.
if we let the caller update the process bar and We only need to download one image this looks kinda awkward, I suggest using the downloaded/total size to show the process in the bar.
@bbappserver I'm more thinking of the context running multiple jobs in parallel (e.g. downloading mix of by member id or by tags), so inside the thread it just call the appropriate method.
https://github.com/Nandaka/PixivUtil2/blob/master/PixivBatchHandler.py#L157 https://github.com/Nandaka/PixivUtil2/blob/master/PixivBatchHandler.py#L82 https://github.com/Nandaka/PixivUtil2/blob/master/PixivBatchHandler.py#L106
Currently, the UI printout and logging is kind of tied together, using curses
won't mess with it assuming the thread id is included in the logging.
As for the notifier, it is up to the dev to include how much details they want to include, currently it just pass start and stop message.
It is not hard to guess that the program gets all data and download links in a init process, why not put them in a queue and download them multithreaded?