CGRU / cgru

CGRU - AFANASY
http://cgru.info/
GNU Lesser General Public License v3.0
278 stars 111 forks source link

Houdini PDG Afanasy Scheduler #514

Open timurhai opened 3 years ago

timurhai commented 3 years ago

Hi everybody! This issue is created to discuss PDG Afanasy scheduler implementation. Since the implementation scheduler is started, I decided to create a new issue for more concrete discussion.

timurhai commented 3 years ago

Here is the first commit.

It uses dynamic method. On work item schedule a new block/task will be appended to an existing job.

Each TOP node work items are joined in a block. In feature we should have an ability to setup TOP node Afanasy task parameters via block parameters (capacity, service, parser and so on). Also it helps to visualize job structure in GUIs.

For now "control" job is just an empty task - an opened Houdini scene is needed with a running graph.

timurhai commented 3 years ago

There is a lots of work to do. Only few features/callbacks are implemented, no checks for any errors can happen. I can say that for now it is the minimal version that can just work, if everything is just OK.

lithorus commented 3 years ago

Would it perhaps be an idea to create the scheduler 100% using python and replace the .hda?

This way it's easier to subclass it to make customizations.

timurhai commented 3 years ago

If it is possible, it will be better. Is it possible? (may be i missed something)

lithorus commented 3 years ago

Yes, look at the other schedulers.

In the templateBody class method. I really hope they extend this to not just TOP.

timurhai commented 3 years ago

It seems that layout is not supported by templateBody https://www.sidefx.com/forum/topic/74776/?page=1#post-318968

lithorus commented 3 years ago

Hmmm.. I will try and see if something can be done through "on creation" callbacks..

timurhai commented 3 years ago

"Submit Job As Graph" sends a job with 1 block and 1 task to cook TOP network. This job will create another separate job and dynamically append job/tasks to it. It will works the same as you to cook from Houdini session (that task command does the same). So you can re-cook w/o opening Houdini, if you delete work items job and restart graph job.

timurhai commented 3 years ago

By default, workItemResultServerAddr() returns local host name and port. This address is used to notify PDG (in an opened Houdini session) that an item is done. As Afanasy task can be not done if an item is in a batch. This way PDG can start to render if the first frames of a simulation finished, but not the entire simulation task.

But on our farm, artist machine is not reachable by name, only by IP.

The solution to find a local IP address is used from: https://stackoverflow.com/questions/166506/finding-local-ip-addresses-using-pythons-stdlib?page=1&tab=votes#tab-top

May be better to create an option (checkbox) for this on the scheduler node.