BhallaLab / moose-core

C++ basecode and python scripting interface
https://moose.ncbs.res.in
GNU General Public License v3.0
15 stars 27 forks source link

`moose.Streamer` not writing to disk until `moose.start()` completes #360

Closed dbdorman closed 5 years ago

dbdorman commented 5 years ago

moose.Streamer writes the header, but nothing else, to disk while moose.start(simtime) is running. It does not flush to disk until moose.start(simtime) completes. I tested it with moose-core/tests/python/test_streamer.py but changed the simulation time to 5700 seconds so I could monitor whether the file was being written to during the simulation. No data was written (except the header, written on moose.reinit) until the python command moose.start(simtime) completed. I tested with both .npy and .csv formats and had the same issue with both.

Is this the expected behavior? As a workaround I can call moose.start repeatedly at shorter intervals to flush to disk.

My version of Moose is the current moose-core git master branch as of earlier today, compiled on Fedora, using Python3.

dilawar commented 5 years ago

Thank you for catching and reporting it. It is a regression bug caused by #352. PR #361 should fix it. The nightly package should be available in couple of hours. I have tested the new changes with watch -n 0.001 stat outfile.csv and it seems to be working fine.

$ pip3 install pymoose --user --upgrade --pre 

Alternatively you can git pull and build by yourself.

On large data set, writing to .npy may cause data corruption. It is very rare but it has happened to me couple of times. I was not able to debug/pinpoint the cause of it. Plain csv format is recommended.

dbdorman commented 5 years ago

Thanks for the quick fix. I also had an issue with corrupted .npy file, that I though might have been due to trying to write too many columns because I didn’t see it once I reduced the number of tables in the streamer. I’ll follow your advice and use csv.

dilawar commented 5 years ago

@dbdorman Thanks for the pointer about number of columns and data corruption in npy format. The length of header may be computed wrongly (https://github.com/numpy/numpy/blob/067cb067cb17a20422e51da908920a4fbb3ab851/doc/neps/nep-0001-npy-format.rst). I'll take another shot at it.

dbdorman commented 5 years ago

@dilawar Thanks, I think my header must have exceeded the length allowed by the npy 1.0 format, but the npy 2.0 format allows a longer header length: See this section from the link you mentioned. However it looks like Moose is only using the npy 1.0 format, in this line

dilawar commented 4 years ago

@dbdorman After #395 is merged, you should be able to use npy format with the moose.Streamer. The streamer ticks every 10 seconds simulation time. At the end of simulation, it appends the leftover data to the file.

If you face any issue, let me know.