Open maxpiva opened 8 years ago
The move from XML to JSON is a good thing. Also I understand that you want split this:
for int = 0; i++; i < 10
{
command[i].execute();
}
to :
command[0].execute();
command[1].execute();
...
command[9].execute();
?
Do we have errors because of half done task being run after restart? Maybe better (less cpu/time consuming) would have each command had it ID (PID) and have like last 1000 command executed in db so when you run half done command you would check againts list of done commands?
Also the conversion wont resolve situation when command will be converted to list of commands and someone shutdown jmm in middle - or you wont save in middle state of process and restart it from begining?
Do we know how big those iteration commands can get?
Using this thread for the Command Worker refactors as talked about on Discord.
The goal is to remove the three separate command workers and instead have a single queue that can run multiple queries at once.
Assigned @maxpiva as he'll be working on this.
Any progress? Currently hashing is EXTREMELY slow because it is basically single-threaded. Same applies for linking files.
Linking files is dependent on AniDB's rate limits, so it can't be a ton faster. It's faster by nature of some serious overall performance improvements. The queue refactor is going to take a while. Some things will happen first, with others coming later. I'm still not certain how I want to do the scheduling, whether to have chains of commands happen in a line for more user-friendly progress indication or to just throw it in a bucket and do things mostly in the order they are queued with some other rules (the way it is now). Hashing is likely one of the first things to be in the new queue. I'm not sure it will help much, though, as hashing is bottlenecked on CPU and IO.
Oh apparently I never made an issue detailing my plans and thoughts regarding the queue rewrite. The plan is to use Quartz.net with a custom scheduler
It's also much more currently WIP than 2018 lol. Progress was made only a few months ago. Then I shifted to the filter rewrite
@da3dsoul
Linking files is dependent on AniDB's rate limits, so it can't be a ton faster. It's faster by nature of some serious overall performance improvements.
For me personally, moving files from import folder into destination folder during linking takes 99% of time.
Hashing is likely one of the first things to be in the new queue. I'm not sure it will help much, though, as hashing is bottlenecked on CPU and IO.
With modern SSDs - CPU is a biggest bottleneck. However spreading work over several cores by using multi-threading would be just enough solution.
I would check your setup. If it's on the same disk, it's almost instant for me.
Hashing is supposed to be multi-threaded, but I've seen complaints about the speed of rhash (the hash library we use)
@da3dsoul
I would check your setup. If it's on the same disk, it's almost instant for me
It is same disk, but folders are mounted inside docker container separately, so OS does a hard-copy (because it sees it as different filesystems).
Also think about people who mount NAS remote folder for import.
I'd change that then, you can non-destructively edit import folders after setting your docker volumes to a single, common root directory.
@da3dsoul
Oh apparently I never made an issue detailing my plans and thoughts regarding the queue rewrite. The plan is to use Quartz.net with a custom scheduler
I would suggest to look into Rx.Net (or any other reactive framework). From my experience any IO bound-work becomes sooo much faster.
I'm not sure how well reactive programming would work in Shoko. It works great when dealing with APIs, but most of Shoko is old and.... synchronous.
I think queue workers are kinda separate subsystem, and because you are planning to redesign it from scratch - it seems to be a good candidate.
I have no idea how a reactive framework would help with a queue. Can you elaborate?
Queue consumers could be implemented as bound async sink running on separate thread pools.
This would allow to have flexible concurrency based on type of task in the queue:
Its also possible to have just single thread pool and restrict concurrency per task type, but it would be a bit more complicated.
Pretty sure Quartz can do all of that
I don't have much experience with Quartz.NET, but i've had some time with Java version. According to my experience and Quartz.NET documentation (https://www.quartz-scheduler.net/documentation/quartz-3.x/configuration/reference.html#threadpool) you can have just single thread pool for all tasks 🥲
Well you have a single thread pool for all of Quartz, but you can subdivide it via the scheduler and job type. I'm not privy to how that might matter, but I don't see a problem with it
Well you have a single thread pool for all of Quartz, but you can subdivide it via the scheduler and job type.
AFAIK you can't control job execution concurrency on that level. But i ain't expert with Quartz.NET.
I'm not privy to how that might matter, but I don't see a problem with it
Throttled API requests (such as AniDB) and Hasher (or any other CPU/IO bound work) likely should have different concurrency factor.
Well, I'll find out the hard way when I try to write the scheduler for the hasher. It'll be annoying, since I've put a decent bit of effort into Quartz, but I'm not against finding another option if that happens. I just don't want to write my own job queue and job store. We tried, and that's why it's so terrible.
Update Command Request from
We add the following
CommandRequest Interface will only have the Process Method.
A BaseCommandRequest child class could be made with helper functions, like populating the command request data initial values.
CommandRequest will extended into the commands we have today as childrens of CommandRequest or BaseCommandRequest, the child classes could have any number of extra properties to function. We use json.net to serialize the Child class commandrequest, and store it in CommandDetails, we do that in the save function of the repository, we clear CommandDetails then we serialize the object into CommandDetails using JSON.NET serializationSettings (TypeNameHandling=Auto) that will serialize the object with the correct children type. On CommandRequest retrieval from the database, we deserialize and return the deserializer child object instead the original Plain CommandRequest entity.
Any jmmserver worker should be split in two parts, the batch part (foreach, for, etc). and convert the loop part items into a commandrequests.
Off course a Thread Dispatcher should be made. And a good API to obtain information for the global status, group status, running requests, etc.
Any thoughts?