Wictim commented 9 years ago

I couldn't make a pull request since the code is basically totally different from the original. ok, I'll try to be brief this time, not like in the comment section.

PixivUtil2 modifycations

Let me start what I erased first:

Interface -> to make it launchable from command line
Loop for writing into file -> makes file handeling easier
Unnecessary prints -> hard to read/follow what's actually going on
Loggers -> useful when you are catching bugs
SQL database -> got locked up once there were more streams to interact with it

What I modifyed:

I set file format to:

// why into single file? clearly personal reason filenameformat = Download\%member_id% - %urlFilename% - %title% filenamemangaformat = MangaDownload\%member_id% - %urlFilename% - %title% // useful for filtering out files later
data are written into file just once, not in loop

save.write(res.read())
printing out info

total_time = (datetime.datetime.now() - start_time).total_seconds() print 'downloading: {0: >10} - {1: >10}({2: >3}) in {3: <10}s ({4: >15}) {5: >10} Bytes : {6} {7}'.format(member_id, image_id, index, total_time, PixivHelper.speedInStr(file_size, total_time), file_size, message,message2)

Now what I added:

Dictionary which has as keys file paths and as values file sizes -> in short: you give it file path and you get size => If the key is not in there -> file doesn't exist, size=0 -> file is empty, size>0 -> actual size, obviously.

// goes before loop for images to process in process_member // I named it as list even though it's a dictionary global img_list img_list = {} for a in os.listdir(config.rootDirectory+"/Download/"): if a.startswith(str(member_id)+" ") and os.path.exists(config.rootDirectory+"/Download/"+a) and os.path.isfile(config.rootDirectory+"/Download/"+a): img_list.update({os.path.abspath(config.rootDirectory+"/Download/"+a):os.path.getsize(config.rootDirectory+"/Download/"+a)}) // you do the same for manga

What's the benefit:

Less access to HDD, and for directories with hundreds of thousands files it makes a difference.

// checking if the file is present if filename in img_list and img_list[filename] > 0:

What is the con:

more memory usage -> nothing crtitical tho
source code with scripts

[entire source code with shell scripts for linux to make it run parallely] http://download1074.mediafire.com/ypg9tqb1ntng/fiov368m2dek9us/parallelPixivUtil2.zip

additions

imacros script firefox: https://addons.mozilla.org/en-us/firefox/addon/imacros-for-firefox/ chrome: (never tried the script on chrome) https://chrome.google.com/webstore/detail/imacros-for-chrome/cplklnmnlbnpmjogncfgfijoopmnlemp?hl=en

' the way to use it: ' - open up pictures to another tabs (ctrl + left click) from here ' http://www.pixiv.net/ranking_area.php?type=detail&no=6 ' or another search that which will then get you ' pictures which have recommendations on the right side of the page ' - make sure that the clipboard is clear, ' or at least there's something i don't mind which will be joined with the new grabbed IDs. (eg 1 space) ' - switch to tab with the picture i opened first ' - run a loop of this script as many times as the count of tabs i opened ' ' then? save them OR put them into Pixiv Downloader ' the rest is obvious

SET !EXTRACT_TEST_POPUP NO TAG POS=1 TYPE=A ATTR=TXT:Works EXTRACT=HTM TAG POS=1 TYPE=DIV ATTR=CLASS:_layout-thumbnail&&TXT: EXTRACT=HTM TAG POS=2 TYPE=DIV ATTR=CLASS:_layout-thumbnail&&TXT: EXTRACT=HTM TAG POS=3 TYPE=DIV ATTR=CLASS:_layout-thumbnail&&TXT: EXTRACT=HTM SET !VAR1 {{!CLIPBOARD}} ADD !VAR1 " " ADD !VAR1 EVAL(" \"{{!EXTRACT}}\".match(/id=\"?[0-9]+/g).toString().match(/[0-9]+/g).toString().replace(/,/g,' '); ") SET !CLIPBOARD {{!VAR1}} TAB CLOSE

script for parallel launch is in the zip file I posted, just make sure you rewrite the paths.
It is still not finished, as I said elsewhere, it could be faster, because I didn't have time to check for already existing files in other parts of code!

Cheers

Wictim commented 9 years ago

First thing, I'm sorry for the .zip bomb (the mediafire link), wanted to slap myself when I realized that.

Ok, I bet that most of this is probably useless for the master branch, and I also forgot to say that there are more cons, like setting file names to one format, yet that this parallel download is for linux only now. (that is, if there's anybody willing to write a script on windows that does the same thing, then I think there's chance it could work out).

And like I said before "it could be faster", then it is now. I finally boosted the shit out of checking already downloaded files. Before It would be checking for already downloaded files like... long, because it has to download info and actually make sure that it's there. Now, it deducts from the order of files what is downloaded and what is missing. If you can live with the fact, that the last manga gets to be checked once you are downloading entire gallery every time, then I could say it does work just like SQL database, except the thing it stores data.

There are bugs to be expected ofc, since the order and download doesn't always follow the patterns which could easily be read from and I'm still learning python alongside the code I work with. For now it is as it is.

Here are the updated source codes: http://download1594.mediafire.com/uid4ppmarwvg/l6dl7xcs86pm9k2/parallelPixivUtil2%282%29.zip

If you wish to contact me directly, about some stuff, you can write me an email (wictimcz@gmail.com) or add me on skype (wictim7 - I should be named as "Yokoca" at this moment, so look for a pink avatar. no-homo lol)

Nandaka commented 9 years ago

Hmm, looks like tailored to very specific use :smile:

The dictionary list cannot be use as filename format is not always the same, so I still need to use the DB. Maybe I can find a way to get lock-free writing.

for writing at once, maybe OK if you have fast internet, but for slow one, the application will look like stalling... Maybe I can add an option to make the buffer configureable.

vampiricwulf commented 6 years ago

Wouldn't redis work? I use it for a real-time imageboard so it has to complete many write and append operations at once.

bluerthanever commented 4 years ago

Hmm. The downloads are invalid. I am a little bit interested in the parallel download thingy.... I would love to see how it's done. But not sure if it's quite different from the current version, after all this issue was a few years ago....

Nandaka commented 4 years ago

UI will be impacted because currently it only printing the output to console without special handling. Another one is the rate limiting, else you risk getting banned from pixiv server due to many 'unusual' connection.

Nandaka commented 4 years ago

no plan to support by me, but I'm open for pull request

Nandaka / PixivUtil2

Tips and useful tools from parallel version of PixivUtil2 edited by Wic #75

PixivUtil2 modifycations

source code with scripts

additions

It is still not finished, as I said elsewhere, it could be faster, because I didn't have time to check for already existing files in other parts of code!