CGRU / cgru

CGRU - AFANASY
http://cgru.info/
GNU Lesser General Public License v3.0
278 stars 111 forks source link

Solve order not working as expected #583

Closed sebastianelsner closed 1 year ago

sebastianelsner commented 1 year ago

Hello!

I setup a new Afanasy server with the newest version. The job solving does not work as I was used to. Previously (v2.3.1) I had it in a way that the tasks were rendered in order of the creation of the job. For this I set the parameters of all branches to:

  "solve_method": "solve_order",
  "solve_need": "solve_capacity",
  "solve_jobs": true,

The webui shows this:

Selection_014

I dont need to have any "fair" or "cooperative" behaviour at all and just need the jobs to be worked in in a sequential fashion.

Has there changed anything in the new version 3.3 which i need to add in the config?

sebastianelsner commented 1 year ago

I think I understand now what the actual issue is:

To reproduce, assume we have one renderclient. We have multiple jobs, each job has 4 Blocks. Blocks depend on the one before in the same job, so Block 4 depends on Block 3, Block 3 on Block 2 etc.

What happens now is that once Job 1 Block 1 is done the server will solve in a way that Job 2 Block 1 is run next. But that is not what is wanted, we want Job 1 Block 2 to run after Job 1 Block 1. Job 1 is older and even may have a higher prio. You can see the behaviour in the Video I attach.

If I remove the block dependencies, this is no happening and the one client will work on Job 1 Block 1, then Job 1 Block 2 etc.

Generally, I set the solve_jobs to true. Otherwise the farm is completely new bootstrapped and has default settings.

Is this by design?

https://github.com/CGRU/cgru/assets/902798/d650c5ac-10d7-4ceb-9a62-c8b9308caf30

EDIT:

You can use this as a script to repro. Assuming you have only one client you will see the behaviour from the video:

import af

for i in range(2):
    job = af.Job("test")

    b1Name = "b1"
    b1 = af.Block(b1Name)
    t1 = af.Task(b1Name + " Task 1")
    b1.tasks.append(t1)
    t1.setCommand("sleep 3")
    job.blocks.append(b1)

    b2name = "b2"
    b2 = af.Block(b2name)
    b2.setDependMask(b1Name)
    t2 = af.Task(b2name + " Task 1")
    b2.tasks.append(t2)
    t2.setCommand("sleep 3")
    job.blocks.append(b2)

    b3name = "b3"
    b3 = af.Block(b3name)
    b3.setDependMask(b2name)
    for i in range(1, 4):
        t3 = af.Task(f"{b3name} Frame {i}")
        b3.tasks.append(t3)
        t3.setCommand("sleep 3")
    job.blocks.append(b3)

    b4name = "b4"
    b4 = af.Block(b4name)
    b4.setDependMask(b3name)
    t4 = af.Task(b4name + " Task 1")
    b4.tasks.append(t4)
    t4.setCommand("sleep 3")
    job.blocks.append(b4)
    job.send()
timurhai commented 1 year ago

Hello! Sorry for a delay. I can test all it on the next week. And I will do it. Most probably it is a refresh/solve issue. Block still has depend status till the next run cycle, but render can take task in this run cycle. If it is so, it is a bug (wrong solving issue).

timurhai commented 1 year ago

Hello! This extra progress update just after depends, fixes the issue: https://github.com/CGRU/cgru/commit/f00b6983b0e4864ea0df3ee13969f7806f6fc921#diff-398970149dd0f578b9e13e39f8961b21c072c8abc6b4a517c9f1c3842bd344f3R445

I do not think that it can break anything else, but may be some checks needed.

sebastianelsner commented 1 year ago

Cool! Thanks! It fixes the problem for me. I will let it run a bit on the testfarm and see if it does something. Can you test too?

timurhai commented 1 year ago

On a test farm already running from that moment - all good. Today or tomorrow I will update our real farm.

timurhai commented 1 year ago

Hello! It works for a month on our farm.