Nandaka / PixivUtil2

Download images from Pixiv and more!
http://nandaka.devnull.zone/
BSD 2-Clause "Simplified" License
2.39k stars 254 forks source link

Tips and useful tools from parallel version of PixivUtil2 edited by Wic #75

Closed Wictim closed 4 years ago

Wictim commented 9 years ago

I couldn't make a pull request since the code is basically totally different from the original. ok, I'll try to be brief this time, not like in the comment section.

PixivUtil2 modifycations

Let me start what I erased first:

What I modifyed:

Now what I added:

// goes before loop for images to process in process_member // I named it as list even though it's a dictionary global img_list img_list = {} for a in os.listdir(config.rootDirectory+"/Download/"): if a.startswith(str(member_id)+" ") and os.path.exists(config.rootDirectory+"/Download/"+a) and os.path.isfile(config.rootDirectory+"/Download/"+a): img_list.update({os.path.abspath(config.rootDirectory+"/Download/"+a):os.path.getsize(config.rootDirectory+"/Download/"+a)}) // you do the same for manga

What's the benefit:

// checking if the file is present if filename in img_list and img_list[filename] > 0:

What is the con:

[entire source code with shell scripts for linux to make it run parallely] http://download1074.mediafire.com/ypg9tqb1ntng/fiov368m2dek9us/parallelPixivUtil2.zip

additions

' the way to use it: ' - open up pictures to another tabs (ctrl + left click) from here ' http://www.pixiv.net/ranking_area.php?type=detail&no=6 ' or another search that which will then get you ' pictures which have recommendations on the right side of the page ' - make sure that the clipboard is clear, ' or at least there's something i don't mind which will be joined with the new grabbed IDs. (eg 1 space) ' - switch to tab with the picture i opened first ' - run a loop of this script as many times as the count of tabs i opened ' ' then? save them OR put them into Pixiv Downloader ' the rest is obvious

SET !EXTRACT_TEST_POPUP NO TAG POS=1 TYPE=A ATTR=TXT:Works EXTRACT=HTM TAG POS=1 TYPE=DIV ATTR=CLASS:_layout-thumbnail&&TXT: EXTRACT=HTM TAG POS=2 TYPE=DIV ATTR=CLASS:_layout-thumbnail&&TXT: EXTRACT=HTM TAG POS=3 TYPE=DIV ATTR=CLASS:_layout-thumbnail&&TXT: EXTRACT=HTM SET !VAR1 {{!CLIPBOARD}} ADD !VAR1 " " ADD !VAR1 EVAL(" \"{{!EXTRACT}}\".match(/id=\"?[0-9]+/g).toString().match(/[0-9]+/g).toString().replace(/,/g,' '); ") SET !CLIPBOARD {{!VAR1}} TAB CLOSE

Cheers

Wictim commented 9 years ago

First thing, I'm sorry for the .zip bomb (the mediafire link), wanted to slap myself when I realized that.

Ok, I bet that most of this is probably useless for the master branch, and I also forgot to say that there are more cons, like setting file names to one format, yet that this parallel download is for linux only now. (that is, if there's anybody willing to write a script on windows that does the same thing, then I think there's chance it could work out).

And like I said before "it could be faster", then it is now. I finally boosted the shit out of checking already downloaded files. Before It would be checking for already downloaded files like... long, because it has to download info and actually make sure that it's there. Now, it deducts from the order of files what is downloaded and what is missing. If you can live with the fact, that the last manga gets to be checked once you are downloading entire gallery every time, then I could say it does work just like SQL database, except the thing it stores data.

There are bugs to be expected ofc, since the order and download doesn't always follow the patterns which could easily be read from and I'm still learning python alongside the code I work with. For now it is as it is.

Here are the updated source codes: http://download1594.mediafire.com/uid4ppmarwvg/l6dl7xcs86pm9k2/parallelPixivUtil2%282%29.zip

If you wish to contact me directly, about some stuff, you can write me an email (wictimcz@gmail.com) or add me on skype (wictim7 - I should be named as "Yokoca" at this moment, so look for a pink avatar. no-homo lol)

Nandaka commented 9 years ago

Hmm, looks like tailored to very specific use :smile:

The dictionary list cannot be use as filename format is not always the same, so I still need to use the DB. Maybe I can find a way to get lock-free writing.

for writing at once, maybe OK if you have fast internet, but for slow one, the application will look like stalling... Maybe I can add an option to make the buffer configureable.

vampiricwulf commented 6 years ago

Wouldn't redis work? I use it for a real-time imageboard so it has to complete many write and append operations at once.

bluerthanever commented 4 years ago

Hmm. The downloads are invalid. I am a little bit interested in the parallel download thingy.... I would love to see how it's done. But not sure if it's quite different from the current version, after all this issue was a few years ago....

Nandaka commented 4 years ago

UI will be impacted because currently it only printing the output to console without special handling. Another one is the rate limiting, else you risk getting banned from pixiv server due to many 'unusual' connection.

Nandaka commented 4 years ago

no plan to support by me, but I'm open for pull request