KurtBestor / Hitomi-Downloader

:cake: Desktop utility to download images/videos/music/text from various websites, and more.
22.39k stars 2.06k forks source link

Feature request: twitter search without "filter" parameter #2687

Open nisehime opened 4 years ago

nisehime commented 4 years ago

There's option in twMediaDownloader where you can select searching in the timeline without "filter:~" parameter. As it states it's "slow, but high accuracy" It's slow because it gives you all the tweets, including just text ones, which you should ignore.

When I've been checking some user, I compared what twMediaDownloader downloaded a year ago, and what Hitomi downloaded now, and there were 28 missing images. I tried to redownload with twMedia to check and those images were missing too. I've checked manually in the search - those pics didn't appear as well.

So I've tried to disable using "filter" in twMedia and download the range with missing images and it worked. Those pics also appeared in the manual search when I didn't specify "filter".

But as it says it were also much slower. Well, unlike twMedia Hitomi works in parallel (afaik), so maybe in Hitomi it will be faster, but it needs testing to be sure. I guess it'd be better to make this optional like in twMedia.

nisehime commented 4 years ago

Also what have you done to the update feature? Now I have to remove the whole artist folder to force Hitomi into "reading" state. Is it a bug or you wanted that?

KurtBestor commented 4 years ago

The Twitter API has a very low rate limit. It's very slow to read every timeline for every time, So I've changed it to read recent tweets only. See: https://github.com/KurtBestor/Hitomi-Downloader/issues/2303 But I realized that there are several problems, so I changed it to read older tweets also. It would be fine. Try this: https://github.com/KurtBestor/Hitomi-Downloader/releases/tag/Technical-Preview

Can you provide the links for the missing tweets?

nisehime commented 4 years ago

Can you provide the links for the missing tweets?

NSFW:

links https://twitter.com/koragen1925/status/880736402597593088 https://twitter.com/koragen1925/status/1021718897542627328 https://twitter.com/koragen1925/status/1026044628015841280 https://twitter.com/koragen1925/status/1027515585494536192 https://twitter.com/koragen1925/status/1027890331012878336 https://twitter.com/koragen1925/status/1032619855134908417 https://twitter.com/koragen1925/status/1039081204279009280 https://twitter.com/koragen1925/status/1041266733498023936 https://twitter.com/koragen1925/status/1047090851568963584 https://twitter.com/koragen1925/status/1050011104280109056 https://twitter.com/koragen1925/status/1058673440058568705 https://twitter.com/koragen1925/status/1059051581507686401 https://twitter.com/koragen1925/status/1088035836367523840 https://twitter.com/koragen1925/status/1097822012364414977 https://twitter.com/koragen1925/status/1105073348675923969 https://twitter.com/koragen1925/status/1109419876794327040

If you try to search this: from:koragen1925 max_id:1119448677712404480 filter:media you won't see these links in the timeline. This is why neither hitomi nor twmedia downloads them. But if you remove filter (from:koragen1925 max_id:1119448677712404480) and scroll through everything, those tweets appear too.

So I've changed it to read recent tweets only.

Wasn't that what I've requested earlier too? That "update" thing.

Try this

I removed some old files from artist's folder and restarted. Now hitomi says it downloaded those removed files, but in fact they dont appear in the folder.

Wasnt that fine before? If the list item was Invalid or Incomplete then restart completely, if it's Completed then grab only recent tweets.

KurtBestor commented 4 years ago

It detects user removed files and doesn't download them. Please see: https://github.com/KurtBestor/Hitomi-Downloader/issues/519

nisehime commented 4 years ago

Please see: #519

I've never noticed there were such functionality before.

I honestly don't understand how this works now. Anyway, my issue still remains. I want to force hitomi into "reading" state, i.e grab all the links again and start downloading. I don't need to download ALL images again, I need to read the whole timeline and download only missing ones. Before I could mark entry as incomplete and restart, now I have to remove the artist's folder in order to make hitomi read the whole timeline, which means to redownload everything.

nisehime commented 4 years ago

Dont ignore please :(

KurtBestor commented 4 years ago

It always "reading" but it skips already downloaded files.

And the missing ones, well, I think it's enough now. Even without "filter", there are still missing files.

nisehime commented 4 years ago

And the missing ones, well, I think it's enough now. Even without "filter", there are still missing files.

I didn't say it will find all the media, but it's still should be higher accuracy, so why not. Have you tested it? I wanted to do it too. Again, if there are problems with such searching, you just can make it optional in the settings.

It always "reading" but it skips already downloaded files.

I want to restart the entry completely and redownload everything except already existing files in the artist's folder. I don't see any ways how to do it besides removing this folder, but I don't need to remove it because it will start downloading all files again. I thought this is how it worked on the previous versions when the entry was Incomplete or Invalid.

It shouldn't skip anything in the "reading" or skip anything previously downloaded, it should "read" all the links, start downloading, and only skip files which are already exist in the folder.

KurtBestor commented 4 years ago

I wanna make the program simple, easy to use without additional knowledge. This feature is too minor and most of the users wouldn't get it. It takes too much time.

I don't understand why you want to read already downloaded tweets.

nisehime commented 4 years ago

It takes too much time.

All I'm asking you is to remove the "filter" from the search query (or make some conditional statement in case of settings option) and then hitomi should read the search timeline as usual ignoring the tweets that don't contain any image/video or twitpic URLs. I may be wrong of course, but I think it should take like 10-15 minutes from you.

Or you mean that it increases the "reading" time? Can you make this at least a command line arg preference, please? I really want to see myself how much time it'd take and how many imgs it'd find compared to normal mode, maybe it really doesn't worth it. Sorry for being selfish, but i'm curious.

I don't understand why you want to read already downloaded tweets.

Various reasons. In cases when hitomi couldn't get all links during the reading, like when the old API stopped working, or if you hypothetically add the requested feature, if I want to test it, I can't do restart to reread the timeline because hitomi will read only recent tweets. Or simply when I removed some files from the folder, but now I want them back.

KurtBestor commented 4 years ago

I've tested for user koragen1925, and the results are as follows:

IMHO, It's not worth it.

nisehime commented 4 years ago

Sigh.

4515 imgs, 4534 imgs

Does it include videos?

Currently, my folder for koragen1925 has 4430 imgs (4712 with mp4). Why do you have more?

KurtBestor commented 4 years ago

imgs + vids

nisehime commented 4 years ago

Can you do something about update and restart things then? Most of the twitter items I have now restarts completely (I mean it reads the whole timeline) even though they're completed.

I had the same bug on the previous versions when after a week I restarted twitter items some of them randomly were reading all again, not updating (reading only new tweets since the last download). I figured out if you remove them from the list and add again it would be fixed but then the next week it happend again with random entries.

Then, on v3.2 updating worked good, but it worked as "update" even when the item was incomplete or invalid. But now (since your fix https://github.com/KurtBestor/Hitomi-Downloader/issues/2687#issuecomment-721672738) it doesn't work again, most of the twitter does not update, but restarts.

I think what I asked you should work fine, I don't understand why you're refusing.

  1. If it is marked as compete, then it works as update like on v3.2, i.e. reads the timeline only to the last downloaded tweet.
  2. If it's Incomplete or Invalid, read the whole timeline and download everything.

How does that conflict with issues you referenced?

KurtBestor commented 4 years ago

It's a bug. If the task is marked as "incomplete", it uses successfully-downloaded file list when it restarts to skip download already downloaded files. But, If you mark the "completed" task as "incomplete", the file list is already empty, so it doesn't skip any files. Because successfully completed task automatically cleans the file list to optimize memory consumption.

Even though it's not intended, I decided to accept the "bug" as a feature. You can read the entire timeline & ignore already downloaded files by following methods:

If you want to read the timeline without "filter:media", use --experimental.

Try this: https://github.com/KurtBestor/Hitomi-Downloader/releases/tag/Technical-Preview

nisehime commented 4 years ago

Thank you for the experimental feature, I'll test it a bit later.

About restarting. Now if I mark item as incomplete it works like I asked. But I still have tasks which do full reading while being marked as completed. They don't download old images, though.

Removing it from the list and adding again fixes this, but I wonder will it be fine on the next week? Currently it looks like it works as on v3.1.

Could you use code for updating from 3.2 while keeping the behaviour for incomplete tasks?

nisehime commented 3 years ago

On current TP this updating is barely working. Most of my twitter items go through full reading.