MercenariesEngineering / coalition

A small but beautiful task manager to manage, why not, your render farm.
http://coalition.readthedocs.io
GNU Lesser General Public License v2.1
56 stars 17 forks source link

Looks very promising / general state of this project #58

Open rdelillo opened 4 years ago

rdelillo commented 4 years ago

Hello Guys,

I came across this project randomly looking at some Guerilla stuff, this looks very promising and from what I can see, it wouldn't take that much work to become a serious competitor to commercial solution. So I was wondering what was the current status of this project ? With other recent open-source initiatives (OpenCue mostly), is that still on your radar ? (I saw that 3.10 is latest "stable" version, but I can find some early work for a 4-beta-version )

Do you know if that's used by other people/companies, and what's general feedback on it ? I'm also interested in general performance feedback, if you have, (mostly DB and REST server limits, I guess).

Looking forward to hear from your guys, because if that's still relevant, I feel like I'd be interested to give it a try aiming to actively contribute to it.

Thanks, robin

doubleailes commented 4 years ago

Hello Robin,

I'm not the developper of the project, but i'm a big user.

So I was wondering what was the current status of this project ? With other recent open-source initiatives (OpenCue mostly), is that still on your radar ?

Yes !

Do you know if that's used by other people/companies, and what's general feedback on it ? My feedback is biais it simple and do the job. The major issues IMHO is the monitoring. I'm also interested in general performance feedback, if you have, (mostly DB and REST server limits, I guess).

Without revealing confidiential information the version 4 even beta is use in production, we use it at The Yard since 2016. And i saw with my own eyes running with more than 1 000 nodes. Any help is welcome. I start to wrote a ELK plugin. I need also to pack a pre configure DB in the docker compose.

Cheers.

Phil

rdelillo commented 4 years ago

Awesome, thanks for the update Phil !

Basically, I'm looking around for free farm manager solutions, getting annoyed to we must pay year after year for products which are not evolving that much anymore. Had a look at Flamenco (Blender) which seems way too specific, and OpenCue which looks more complex with a galaxy of component I won't use.

I believe Coalition could become production-friendly for me with a bit of love,

Let's say we want to contribute, do you know who I talk to ? This will likely be more than a couple of small PRs, shall I just fork the project and start an new iteration over there ?

Cheers, robin

rdelillo commented 4 years ago

Forgot to say would really like to have a look at an ELK plugin for that, definitely agree that, moreover the UI, a live Grafana dashboard followed by Kibana statistics are the next steps to me.

developer-cube-creative commented 4 years ago

Hi,

At Cube Creative, we are using coalition as our main renderfarm dispatcher (in conjunction with RoyalRender for legacy issues), with Blender 2.79. We have been using it for more than a year now, on a 52x11" series, soon to be 2x52x11".

We have around 200 nodes. Blender forces us (for that particular project) to create one subjob per frame, per pass. Which produces a very large amount of subjobs, and causes large slowdowns. (the server stops responding at ~100 000 jobs total, we still need to investigate that matter. Maybe @doubleailes you have some insights on this ?). This is our main issue with Coalition so far. An 'end time' column would be appreciated too

We also built a PySide client to fit our studio-specific needs, and did some fixes/adjustments to coalition internally (those fixes concern mainly studio-specific issues)

@rdelillo , the suggestions you are making are very interesting.

Also, is this fourth version available to the public ?

rdelillo commented 4 years ago

thanks a lot @developer-cube-creative for the feedback much appreciated !

I totally see where you're coming from and agree that 100,000 concurrent sub-jobs does not sound crazy to me. Regarding the performance slow-downs you've experienced, (assuming this isn't related to any other internal tools you got), I'm even more curious, would you know if that's more on the REST server side or the DB ?

thanks again for your feedback!

developer-cube-creative commented 4 years ago

Hi,

Our Coalition is hosted on a virtual machine Centos with a CPU Intel Core i7 9xx 8 core. Unfortunately we are still not sure whether the issue is REST server side, from the DB or a third party. We don't think it's related to the number of workers or the DB since in the past few months:

So according to what we have experienced and what we already tried, we think that it is because the server is single-thread and the more jobs there is, the heavier the requests are so when a lot of people are using the ui there is a lot of slowdown (apparently there is no SQL request to get just a small amount of jobs, it's everything or nothing, we are trying to change that as well). Scaling the HTTP part horizontally sounds like an interesting idea, we just talked about implementing Circus so we will keep you updated if we see any changes !

Thanks :)