Closed Wictim closed 4 years ago
First thing, I'm sorry for the .zip bomb (the mediafire link), wanted to slap myself when I realized that.
Ok, I bet that most of this is probably useless for the master branch, and I also forgot to say that there are more cons, like setting file names to one format, yet that this parallel download is for linux only now. (that is, if there's anybody willing to write a script on windows that does the same thing, then I think there's chance it could work out).
And like I said before "it could be faster", then it is now. I finally boosted the shit out of checking already downloaded files. Before It would be checking for already downloaded files like... long, because it has to download info and actually make sure that it's there. Now, it deducts from the order of files what is downloaded and what is missing. If you can live with the fact, that the last manga gets to be checked once you are downloading entire gallery every time, then I could say it does work just like SQL database, except the thing it stores data.
There are bugs to be expected ofc, since the order and download doesn't always follow the patterns which could easily be read from and I'm still learning python alongside the code I work with. For now it is as it is.
Here are the updated source codes: http://download1594.mediafire.com/uid4ppmarwvg/l6dl7xcs86pm9k2/parallelPixivUtil2%282%29.zip
If you wish to contact me directly, about some stuff, you can write me an email (wictimcz@gmail.com) or add me on skype (wictim7 - I should be named as "Yokoca" at this moment, so look for a pink avatar. no-homo lol)
Hmm, looks like tailored to very specific use :smile:
The dictionary list cannot be use as filename format is not always the same, so I still need to use the DB. Maybe I can find a way to get lock-free writing.
for writing at once, maybe OK if you have fast internet, but for slow one, the application will look like stalling... Maybe I can add an option to make the buffer configureable.
Wouldn't redis work? I use it for a real-time imageboard so it has to complete many write and append operations at once.
Hmm. The downloads are invalid. I am a little bit interested in the parallel download thingy.... I would love to see how it's done. But not sure if it's quite different from the current version, after all this issue was a few years ago....
UI will be impacted because currently it only printing the output to console without special handling. Another one is the rate limiting, else you risk getting banned from pixiv server due to many 'unusual' connection.
no plan to support by me, but I'm open for pull request
I couldn't make a pull request since the code is basically totally different from the original. ok, I'll try to be brief this time, not like in the comment section.
PixivUtil2 modifycations
Let me start what I erased first:
What I modifyed:
I set file format to:
// why into single file? clearly personal reason filenameformat = Download\%member_id% - %urlFilename% - %title% filenamemangaformat = MangaDownload\%member_id% - %urlFilename% - %title% // useful for filtering out files later
data are written into file just once, not in loop
save.write(res.read())
printing out info
total_time = (datetime.datetime.now() - start_time).total_seconds() print 'downloading: {0: >10} - {1: >10}({2: >3}) in {3: <10}s ({4: >15}) {5: >10} Bytes : {6} {7}'.format(member_id, image_id, index, total_time, PixivHelper.speedInStr(file_size, total_time), file_size, message,message2)
Now what I added:
// goes before loop for images to process in process_member // I named it as list even though it's a dictionary global img_list img_list = {} for a in os.listdir(config.rootDirectory+"/Download/"): if a.startswith(str(member_id)+" ") and os.path.exists(config.rootDirectory+"/Download/"+a) and os.path.isfile(config.rootDirectory+"/Download/"+a): img_list.update({os.path.abspath(config.rootDirectory+"/Download/"+a):os.path.getsize(config.rootDirectory+"/Download/"+a)}) // you do the same for manga
What's the benefit:
// checking if the file is present if filename in img_list and img_list[filename] > 0:
What is the con:
source code with scripts
[entire source code with shell scripts for linux to make it run parallely] http://download1074.mediafire.com/ypg9tqb1ntng/fiov368m2dek9us/parallelPixivUtil2.zip
additions
' the way to use it: ' - open up pictures to another tabs (ctrl + left click) from here ' http://www.pixiv.net/ranking_area.php?type=detail&no=6 ' or another search that which will then get you ' pictures which have recommendations on the right side of the page ' - make sure that the clipboard is clear, ' or at least there's something i don't mind which will be joined with the new grabbed IDs. (eg 1 space) ' - switch to tab with the picture i opened first ' - run a loop of this script as many times as the count of tabs i opened ' ' then? save them OR put them into Pixiv Downloader ' the rest is obvious
SET !EXTRACT_TEST_POPUP NO TAG POS=1 TYPE=A ATTR=TXT:Works EXTRACT=HTM TAG POS=1 TYPE=DIV ATTR=CLASS:_layout-thumbnail&&TXT: EXTRACT=HTM TAG POS=2 TYPE=DIV ATTR=CLASS:_layout-thumbnail&&TXT: EXTRACT=HTM TAG POS=3 TYPE=DIV ATTR=CLASS:_layout-thumbnail&&TXT: EXTRACT=HTM SET !VAR1 {{!CLIPBOARD}} ADD !VAR1 " " ADD !VAR1 EVAL(" \"{{!EXTRACT}}\".match(/id=\"?[0-9]+/g).toString().match(/[0-9]+/g).toString().replace(/,/g,' '); ") SET !CLIPBOARD {{!VAR1}} TAB CLOSE
It is still not finished, as I said elsewhere, it could be faster, because I didn't have time to check for already existing files in other parts of code!
Cheers