LogicReinc / LogicReinc.BlendFarm

A stand-alone Blender Network Renderer
GNU General Public License v3.0
412 stars 35 forks source link

Clients don't automatically reconnect #80

Open atoav opened 11 months ago

atoav commented 11 months ago

During the same long animation described in #79 at some point during the render things got slower. The two clients had disconnected and I had to reconnect them manually, after which things went considerably faster again. I assume this is not intended behaviour.

I don't know why the clients disconnected, maybe my router installed his automatic update or something, but my expectation would be that the clients reconnect automatically (otherwise unsupervised rendering, e.g. over night, is not possible).

LogicReinc commented 11 months ago

Reconnection logic already exists. But if it keeps failing it will give up. So it is sort of intended behavior. I guess I can increase the number of attempts.

atoav commented 11 months ago

Aha, cool, so maybe it just didn't work in this case. Does it have an exponential backoff or just some fixed interval?

LogicReinc commented 11 months ago

Fixed interval, 3 times, 1 second between each retry. The servers keep blender resources alive for 10 seconds after disconnecting. So if the client does not reconnect and "recover" its session, stuff gets deleted.

I can bump this to about 5 without issues.

atoav commented 11 months ago

Gotcha.

I can bump this to about 5 without issues.

Maybe it would be a good idea to instead try reconnecting periodically after that? E.g. periodically once every 15 minutes or so. This would increase the reliability of the whole thing, otherwise a 5 second outage (not unheard of) could mean that after the outage only localhost is rendering for the rest of the time. If you render on a deadline that could ruin your day.

LogicReinc commented 11 months ago

Due to the reasons mentioned before. After 10 seconds resources are disposed, so for a periodic retry, it would require re-syncing. This is purposely not done as it can vastly increase the chance of user error because the file could have been edited etc. Syncing is done across nodes at the same time to avoid this issue.

Note that BlendFarm was originally not designed for animations, but single-frame shared rendering, hence the architecture is more centered around that. And thus the idea of syncing at a later time didn't make much sense.

There is another ongoing issue that asks for re-syncing disconnected nodes, but because this requires some logistical changes for that to be possible, it has not been implemented yet. This likely won't be implemented before 1.1.6 as it requires significantly more testing.