google-deepmind / envlogger

A tool for recording RL trajectories.
Apache License 2.0
93 stars 13 forks source link

Fatal Python error: Aborted #1

Closed mikygit closed 2 years ago

mikygit commented 2 years ago

Hello, I running into an error when playing with the py_writer. I modified it to store 100000 trajectories, which it does apparently by creating a 100000 timestamp subfolders... But it failed to read the data using the following code:

with reader.Reader(data_directory=_TRAJECTORIES_DIR.value) as r: for i, episode in enumerate(r.episodes): logging.info(f"-------------- {i} --------------------: {len(episode)}")

Any ideas? Is there a way to gather the stored files into the same subfolders as it seems to be the pb? Thanx.

WARNING: Logging before InitGoogleLogging() is written to STDERR F20211207 11:48:44.483004 104240 bundle.cc:24] Check failed: finished_ JoinAll() should be called before releasing the bundle. Check failure stack trace: Fatal Python error: Aborted

Current thread 0x00007fa729473740 (most recent call first): File "/usr/local/lib/python3.9/dist-packages/envlogger/backends/riegeli_backend_reader.py", line 45 in init File "/usr/local/lib/python3.9/dist-packages/envlogger/reader.py", line 45 in init File "/envlogger/envlogger/backends/cross_language_test/py_writer.py", line 92 in main File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 258 in _run_main File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 312 in run File "/envlogger/envlogger/backends/cross_language_test/py_writer.py", line 104 in Aborted (core dumped)

kenjitoyama commented 2 years ago

Hello @mikygit!

I believe this particular error seems to arise from spawning too many threads, which is coming from having too many shards. The subfolders are created to split extremely large trajectories into more manageable subtrajectories. This settting is controlled when writing via max_episodes_per_file. Did you alter that somehow? Does each of your trajectories have a single episode? You can also disable that behavior by setting that parameter to a non-positive integer.

Let me know if it works for you.

Cheers,

Daniel

mikygit commented 2 years ago

Thank you, it did the trick!