ShokoAnime / ShokoServer

Repository for Shoko Server.
https://shokoanime.com/
MIT License
410 stars 74 forks source link

Command Request and background workers refactor #433

Open maxpiva opened 8 years ago

maxpiva commented 8 years ago

Update Command Request from

        public int CommandRequestID { get; private set; }
        public int Priority { get; set; }
        public int CommandType { get; set; }
        public string CommandID { get; set; }
        public string CommandDetails { get; set; }
        public DateTime DateTimeUpdated { get; set; }

We add the following

    public string CommandDisplayName { get; set; } //Display Name used in GUIs
    public int MaxParallelTasks { get; set; } //Max number of parallel task this command support.
    public string ParallelIdentifier { get; set; } //Per example in case of import files, this could be the drive name, so same Parallel Identifiers cannot be paralelized.
    public string GroupName { get; set; } //example hasher, updater, images, anidb, etc, etc. Different Task different names.
    public string BatchIdentifier { get; set; } //unique Identifier, using Guid.NewGuid(), things added in the same batch should have the same identifier.

CommandRequest Interface will only have the Process Method.

A BaseCommandRequest child class could be made with helper functions, like populating the command request data initial values.

CommandRequest will extended into the commands we have today as childrens of CommandRequest or BaseCommandRequest, the child classes could have any number of extra properties to function. We use json.net to serialize the Child class commandrequest, and store it in CommandDetails, we do that in the save function of the repository, we clear CommandDetails then we serialize the object into CommandDetails using JSON.NET serializationSettings (TypeNameHandling=Auto) that will serialize the object with the correct children type. On CommandRequest retrieval from the database, we deserialize and return the deserializer child object instead the original Plain CommandRequest entity.

Any jmmserver worker should be split in two parts, the batch part (foreach, for, etc). and convert the loop part items into a commandrequests.

Off course a Thread Dispatcher should be made. And a good API to obtain information for the global status, group status, running requests, etc.

Any thoughts?

bigretromike commented 8 years ago

The move from XML to JSON is a good thing. Also I understand that you want split this:

for int = 0; i++; i < 10
{
command[i].execute();
}

to :

command[0].execute();
command[1].execute();
...
command[9].execute();

?

Do we have errors because of half done task being run after restart? Maybe better (less cpu/time consuming) would have each command had it ID (PID) and have like last 1000 command executed in db so when you run half done command you would check againts list of done commands?

Also the conversion wont resolve situation when command will be converted to list of commands and someone shutdown jmm in middle - or you wont save in middle state of process and restart it from begining?

Do we know how big those iteration commands can get?

ElementalCrisis commented 7 years ago

Using this thread for the Command Worker refactors as talked about on Discord.

The goal is to remove the three separate command workers and instead have a single queue that can run multiple queries at once.

Assigned @maxpiva as he'll be working on this.

t3hk0d3 commented 10 months ago

Any progress? Currently hashing is EXTREMELY slow because it is basically single-threaded. Same applies for linking files.

da3dsoul commented 10 months ago

Linking files is dependent on AniDB's rate limits, so it can't be a ton faster. It's faster by nature of some serious overall performance improvements. The queue refactor is going to take a while. Some things will happen first, with others coming later. I'm still not certain how I want to do the scheduling, whether to have chains of commands happen in a line for more user-friendly progress indication or to just throw it in a bucket and do things mostly in the order they are queued with some other rules (the way it is now). Hashing is likely one of the first things to be in the new queue. I'm not sure it will help much, though, as hashing is bottlenecked on CPU and IO.

da3dsoul commented 10 months ago

Oh apparently I never made an issue detailing my plans and thoughts regarding the queue rewrite. The plan is to use Quartz.net with a custom scheduler

da3dsoul commented 10 months ago

It's also much more currently WIP than 2018 lol. Progress was made only a few months ago. Then I shifted to the filter rewrite

t3hk0d3 commented 10 months ago

@da3dsoul

Linking files is dependent on AniDB's rate limits, so it can't be a ton faster. It's faster by nature of some serious overall performance improvements.

For me personally, moving files from import folder into destination folder during linking takes 99% of time.

Hashing is likely one of the first things to be in the new queue. I'm not sure it will help much, though, as hashing is bottlenecked on CPU and IO.

With modern SSDs - CPU is a biggest bottleneck. However spreading work over several cores by using multi-threading would be just enough solution.

da3dsoul commented 10 months ago

I would check your setup. If it's on the same disk, it's almost instant for me.

Hashing is supposed to be multi-threaded, but I've seen complaints about the speed of rhash (the hash library we use)

t3hk0d3 commented 10 months ago

@da3dsoul

I would check your setup. If it's on the same disk, it's almost instant for me

It is same disk, but folders are mounted inside docker container separately, so OS does a hard-copy (because it sees it as different filesystems).

Also think about people who mount NAS remote folder for import.

da3dsoul commented 10 months ago

I'd change that then, you can non-destructively edit import folders after setting your docker volumes to a single, common root directory.

t3hk0d3 commented 10 months ago

@da3dsoul

Oh apparently I never made an issue detailing my plans and thoughts regarding the queue rewrite. The plan is to use Quartz.net with a custom scheduler

I would suggest to look into Rx.Net (or any other reactive framework). From my experience any IO bound-work becomes sooo much faster.

da3dsoul commented 10 months ago

I'm not sure how well reactive programming would work in Shoko. It works great when dealing with APIs, but most of Shoko is old and.... synchronous.

t3hk0d3 commented 10 months ago

I think queue workers are kinda separate subsystem, and because you are planning to redesign it from scratch - it seems to be a good candidate.

da3dsoul commented 10 months ago

I have no idea how a reactive framework would help with a queue. Can you elaborate?

t3hk0d3 commented 10 months ago

Queue consumers could be implemented as bound async sink running on separate thread pools.

This would allow to have flexible concurrency based on type of task in the queue:

Its also possible to have just single thread pool and restrict concurrency per task type, but it would be a bit more complicated.

da3dsoul commented 10 months ago

Pretty sure Quartz can do all of that

t3hk0d3 commented 10 months ago

I don't have much experience with Quartz.NET, but i've had some time with Java version. According to my experience and Quartz.NET documentation (https://www.quartz-scheduler.net/documentation/quartz-3.x/configuration/reference.html#threadpool) you can have just single thread pool for all tasks 🥲

da3dsoul commented 10 months ago

Well you have a single thread pool for all of Quartz, but you can subdivide it via the scheduler and job type. I'm not privy to how that might matter, but I don't see a problem with it

t3hk0d3 commented 10 months ago

Well you have a single thread pool for all of Quartz, but you can subdivide it via the scheduler and job type.

AFAIK you can't control job execution concurrency on that level. But i ain't expert with Quartz.NET.

I'm not privy to how that might matter, but I don't see a problem with it

Throttled API requests (such as AniDB) and Hasher (or any other CPU/IO bound work) likely should have different concurrency factor.

da3dsoul commented 10 months ago

Well, I'll find out the hard way when I try to write the scheduler for the hasher. It'll be annoying, since I've put a decent bit of effort into Quartz, but I'm not against finding another option if that happens. I just don't want to write my own job queue and job store. We tried, and that's why it's so terrible.