Jobs Do Not Always Start Or Complete

buildingamind / NewbornEmbodiedTuringTest

A testbed for comparing the learning abilities of newborn animals and autonomous artificial agents.

MIT License

8 stars 0 forks source link

Jobs Do Not Always Start Or Complete #109

Open Zach-Attach opened 3 months ago

Zach-Attach commented 3 months ago

Describe the bug Jobs Do not always complete OR do not even record anything in the first place.

To Reproduce Steps to reproduce the behavior:

Run code
Some of the brain/env combos will not have any results in them (empty csv) OR will not complete (no models/latest_model.zip).

Zach-Attach commented 3 months ago

It appears that this does not happen when running a single brain/env combo in a single run of the script

desaibhargav commented 3 months ago

I'm interested in taking a deeper look at this as well, I'll keep this thread posted with what I find

Zach-Attach commented 2 months ago

Looks like the issue can be avoided when turning off checkpoints while running the script with a single job, waiting 120 seconds between the start of each run. It appears that part of the issue is related to saving checkpoints, so I have made saving checkpoints now optional.