Open CuddleBear92 opened 6 years ago
This will still need to be per file, as we link file to episode. Due to the lack of info on release groups, it will need to be an extension of the Unrecognized Files Utility. It can be used to help manually link and AVDump files, though.
yeah it would need to be per file. don't see that much of an issue in the end if its made to do it over a longer amount of time slowly. the biggest issue would be to stress their servers too much and they would block us. but even at once file an hour or two will give you a series a day in the long run. well if it gets a match on the first screenshot.
it would be most useful for files you know nothing or next to nothing about. or heck lazy users that dont want to figure it out.
should it automatically add an empty series to the database when a match is made? would make sense to automatically do that too as it would cut the time to wait before its all in place when the user finally dumps and rechecks it. and if it did that then it could add an custom tag to the series or a flag that we can filter.
I'd say make it a setting
edited the whole OP to fit the talk that was on the discord server last night. moving away from api usage and their site to doing some work on the user servers and sending it to the webcache to compare.
I also have plans to detect anime series by reading video files. However I can't decide how many thumbnails should be taken from one video for search would yield accurate results within reasonable time. Yet, recently I've updated the database system so it's much faster and less likely to overload now. (note that API limit still applies) Currently whatanime.ga API returns anilist ID and MAL ID from search results. But for AniDB ID, a mapping of AniDB <-> MAL ID is needed.
I wouldn't trust the MAL id especially if we wanted to pull data as the MAL API tends to be unreliable
On Sat, Apr 14, 2018, 8:21 PM soruly notifications@github.com wrote:
I also have plans to detect anime series by reading video files. However I can't decide how many thumbnails should be taken from one video for search would yield accurate results within reasonable time. Yet, recently I've updated the database system so it's much faster and less likely to overload now. (note that API limit still applies) Currently whatanime.ga API returns anilist ID and MAL ID from search results. But for AniDB ID, a mapping of AniDB <-> MAL ID is needed.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ShokoAnime/ShokoServer/issues/703#issuecomment-381319249, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8rph7yMmPDPPytceCLM-uOfdMpUm0tks5toc21gaJpZM4R2Q-F .
You could use AniList with MAL IDs, though.
Hey @soruly, WhatAnime has helped me find a few series so thanks for creating it. :)
We're currently in the middle of doing some major changes to Shoko Server but I like the idea of using WhatAnime to help with our unrecognized process. Looking forward to that update to see what we can do with it. :)
One thing I'd like is to use whatanime.ga to pull the time that a screenshot occurred. TvDB has garbage quality, but we can't just grab randomly, else we'll get trash and/or spoiler shots.
If it's possible to give the server an episode and image, and have it return a time, then we could use it to cross reference TvDB for high quality thumbnails. I say episode and image to reduce the load on the server. We could easily thrash your site with our thousands of users calling it at the same time when our TvDB updates occur.
In the long run I'd prefer to make whatanime.ga distributed, so anyone with skills would be able to setup an image database of their own.
That'd be cool, but doesn't that whole system take a lot of processing power? I'd think it would. If we cut out just the hashing and matching parts, and had a web cache with just storage, then we could put most of the strain on distributed clients with enough power. If you put together such a thing, we would gladly contribute. Some of our users can cover just about every anime ever, and our system requires a decent CPU, or hashing takes forever. I only have a basic idea of how it works, so I'm only guessing.
With my new improvements made on searching, I think a decent quad core CPU would be able to handle any search in 5 seconds. And for hashing video, a 24-minute video takes ~30 seconds to hash on a 4GHz quad core machine).
My plan is to open and publish my hashes (maybe like this https://data.whatanime.ga/100240/ ) and users just need to download and import these pre-hashed files into their own database for local search.
Good Idea! :) "Distributed computing" like SETI
I've opensourced the distributed indexing system, you can take a look https://github.com/soruly/sola
If there is a way we could port the video indexing into .net it could be something to have shoko, probably as an opt-in situation
whatanime.ga has moved to https://trace.moe https://www.patreon.com/posts/moving-to-new-22212117
Doing a new Post of this cause much of the old thoughts about a shared cache isnt really needed or wanted anymore.
Thoughts are all unrecognized anime should be looked up on trace.moe and grabbing all the links tied to it on trace.moe. They use anilist.co and they also link to official sites like the official japanese link to the series and crunchyroll series. These links should be compared with the links hosted on AniDB for matching. Episode number from trace.moe should be kept prob.
Since the plans are to add anilist support more fully, using grabbing that would be wanted.
Automatically matching +85% matches could be done.
Original Post:
https://trace.moe/ can be run locally on the webcache and the user servers. Thumbnails can be taken out of 1min clips at set times in episodes and its hash values generated and sent to the webcache (no image upload to the cache) These will be generated by files already known by anidb that we know what is what. These hash values will be compared too when a user has an unknown file and it will upload its hash value and let the cache compare them for you, the cache will then give out the correct id for easy matching on the user side.
what it might need
Doing it all in the webcache instead of using the trace.moe api will give us more control in general and will remove the limits it has cause of lack of support it has. It wouldn't be limited by the content, no matter how old or if its 18+ or not it wouldn't matter anymore as it all comes from Shoko users and their files. Doing it with strict rules also allows us to not use the whole file and keep hashes across the whole file like the site does. We can rather have a set time with more refined rules as it doesn't require us to match across the whole episode. This in the end removes the need for actual images as well as lessens the size of the database as a whole (their db is 30gb for 673million frames and that's without the images them self)
Links: https://trace.moe/ https://github.com/soruly/whatanime.ga api: https://soruly.github.io/whatanime.ga/ https://www.patreon.com/soruly
EDIT: this comes all from the chat that took place in the future-requests channel on the discord after i brought it up to use the api. using the webcache is a better option for us as it would not rely on anyone but the shoko community and would be more open to more content.