ShokoAnime / ShokoServer

Repository for Shoko Server.
http://shokoanime.com/shoko-server/
MIT License
376 stars 74 forks source link

[Feature Request] Hashing prioritisation #1058

Open Reinachan opened 1 year ago

Reinachan commented 1 year ago

Current behaviour seems random, however, people usually watch anime in sequential order, so it would make sense to prioritise hashing files sequentially based on the episode number when possible.

I suggest that for files with similar names where the only difference is a number, ShokoAnime should prioritise files with a lower number before those with higher numbers. If it's unable to determine the episode number, it should do things the way it's currently doing it.

revam commented 1 year ago

Context;

  1. Shoko doesn't care about filenames at all.
  2. The files are processed in the order they are discovered in (with some exceptions).
revam commented 1 year ago

I'm not against adding a bit more "predictability" to the process, but i also don't see the benefit of adding this behaviour. Others on the team might see it differently though.

Reinachan commented 1 year ago

Context;

  1. Shoko doesn't care about filenames at all.
  2. The files are processed in the order they are discovered in (with some exceptions).

That's what I assumed. I had that issue with my fileserver when reconstructing chunked uploads and ended up fetching filenames first and then initialise the process of reconstructing the file.

I'd suggest doing something similar for Shoko. First grab the filenames, check for prioritisation, then run the hasher.

bigretromike commented 1 year ago
  1. The files are processed in the order they are discovered in (with some exceptions).

If that would be true, wouldn't same series be hashed in order as they should be in same directory ? (assuming they are)

revam commented 1 year ago
  1. The files are processed in the order they are discovered in (with some exceptions).

If that would be true, wouldn't same series be hashed in order as they should be in same directory ? (assuming they are)

Only if they are discovered in sequential order.

Reinachan commented 1 year ago

If that would be true, wouldn't same series be hashed in order as they should be in same directory ? (assuming they are) @bigretromike

Assuming C# (or whatever library) is using the same APIs under the hood as Rust does, that's not the case, no.

This function currently corresponds to the opendir function on Unix and the FindFirstFile function on Windows. Advancing the iterator currently corresponds to readdir on Unix and FindNextFile on Windows. [...]

The order in which this iterator returns entries is platform and filesystem dependent. (source)

That said, this is only an issue when the server is recieving a directory, not when it recieves individual files (like if you're downloading the episodes separately). Idk if those are distinguishable events for the server or not.

Cazzar commented 1 year ago

Ultimately without being stupidly slow in file discovery I don’t feel this will be that viable, and there is the difference between the full file tree scan and the filesystem watcher, once the commands are in the queue, they may be processed typically in order of priority then last updated, but that could change.

we don’t have any sorting currently as to do that we would need to load the entire import folder tree into memory before sorting and such a situation will lead quickly into poor performance in larger collections, and we already have a large memory footprint

maxpiva commented 1 year ago

You mean for initial import or forced rescan?

Because after that the system do ingests from file system watcher events

When the file system watcher detect new files, the order of the import is usually the order you store/copy/move your files in there. We cannot sort something that is not in directory yet. The only case is when you move an entire directory into from the same physical location, which is almost immediately otherwise the system will copy one file at the time, every new file will trigger the event, and the import.

El El vie, 14 de abr. de 2023 a la(s) 13:23, Nina Louise < @.***> escribió:

Current behaviour seems random, however, people usually watch anime in sequential order, so it would make sense to prioritise hashing files sequentially based on the episode number when possible.

I suggest that for files with similar names where the only difference is a number, ShokoAnime should prioritise files with a lower number before those with higher numbers. If it's unable to determine the episode number, it should do things the way it's currently doing it.

— Reply to this email directly, view it on GitHub https://github.com/ShokoAnime/ShokoServer/issues/1058, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI4G3MGBUYV2ZDJ7IHMTJTXBF2YFANCNFSM6AAAAAAW6T7SLM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Reinachan commented 1 year ago

I primarily mean on filewatcher events when you put a directory into the import folder. The way I have things set up is that once an anime is fully downloaded, it'll hardlink the containing folder into the Shoko import folder.

I don't think this should be done on an initial import, nor on individual files placed into the import folder. Only when a directory with multiple anime in it is placed into the import folder. You could also make it optional.

Basically, on filesystem event directiry, read entries in directory, determine sorting, perform in that order.

As for memory footprint, I personally don't mind short spikes of increased memory. You can mark the setting as "potentially memory intensive during imports" if it turns out to be a problem.

da3dsoul commented 1 year ago

That could be done, since a directory detection is unique from a file detection

maxpiva commented 1 year ago

While it could do directory events. Your use case expect the directory appear instantly with their files in the import location, for that specific use case hard link or move directory in the same physical location. It could be done. But if the user copies a directory into the import location. Files are copied one by one, and import order will be the order the system copies the files inside. If your fine with that I think it could be done.

El El dom, 16 de abr. de 2023 a la(s) 12:38, Nina Louise < @.***> escribió:

I primarily mean on filewatcher events when you put a directory into the import folder. The way I have things set up is that once an anime is fully downloaded, it'll hardlink the containing folder into the Shoko import folder.

I don't think this should be done on an initial import, nor on individual files placed into the import folder. Only when a directory with multiple anime in it is placed into the import folder. You could also make it optional.

Basically, on filesystem event directiry, read entries in directory, determine sorting, perform in that order.

As for memory footprint, I personally don't mind short spikes of increased memory. You can mark the setting as "potentially memory intensive during imports" if it turns out to be a problem.

— Reply to this email directly, view it on GitHub https://github.com/ShokoAnime/ShokoServer/issues/1058#issuecomment-1510415762, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI4G3LH7V34SACA4V56O5LXBQHAPANCNFSM6AAAAAAW6T7SLM . You are receiving this because you commented.Message ID: @.***>