DiamondLightSource / PySIMRecon

Easy to use wrapper for pyCUDAsirecon, allowing the use of DV files without PRIISM
GNU General Public License v3.0
0 stars 2 forks source link

Output log files are truncated. #64

Open iandobbie opened 2 months ago

iandobbie commented 2 months ago

I think the log file is not flushed before it is copied out of the temp directory and the original deleted so the output log files are truncated at the end, finishing mid word in many cases. Here are the last few lines of such a log file.

....
Before fftplan3d 6954MB free 
After fftplan 6954MB free 
re-transforming centerband
inserting centerband
centerband assembly completed
mo
thomasmfish commented 1 month ago

Just added a couple of new commits https://github.com/DiamondLightSource/PySIMRecon/pull/40 that should help - specifically using os.fsync to ensure everything is fully written to the files before they are closed. This isn't an issue I've had on my laptop but apparently it's related to the disk cache, so is hardware dependent.

thomasmfish commented 1 month ago

I'll merge that into dev in a sec

thomasmfish commented 1 month ago

Closed by #67

iandobbie commented 1 month ago

I just checked out the latest main branch (e613e2bfbfdafea04c6a661dee1e9ef380139d8a) and the log files are truncated but not as I thought.

I am processing the data from Lothar with 4 channels and channel specific OTFs (trying to sort out the wiener filter stuff). The end of the file is.....

Before fftplan3d 6954MB free
After fftplan 6954MB free
re-transforming centerband
inserting centerband
centerband assembly completed
mo

(pysim) T:\OMX-data>

Looking at the 4 different segments I think the mo comes from earlier steps. I think the later channels are dumping into the log file while the earlier channels are not finished. eg one earlier section ends....

centerband assembly compl
--------------------------------------------------------------------------------

So it looks like the "eted" was truncated from "completed" this segment. Don't understand where the "mo" came from though!

thomasmfish commented 1 month ago

That's odd. The log files should be completely separate in the processing directory, only combined at the end. If you run it with --no-cleanup and inspect the log files in there, what do you see?

iandobbie commented 1 month ago

I just tried this again and am finding the weird variable truncation in the log files in the sub directory.

Its getting to late for me. I will have another look tomorrow if I get time.

iandobbie commented 1 month ago

The final few lines from the 4 log files. All starting at the final occrance of K0 x% line and ending at different places.


447 log file
--------------------

best fit for k0 is 0.036% from expected value.
moving centerband
Before fftplan3d 6954MB free 
After fftplan 6954MB free 
re-transforming centerband
inserting centerband
centerband assembly compl

527 log file
--------------------

best fit for k0 is 0.085% from expected value.
moving centerband
Before fftplan3d 6954MB free 
After fftplan 6954MB free 
re-transforming centerband
inserting centerband
centerband assembly comp

603 log file
--------------------

best fit for k0 is 0.038% from expected value.
moving centerband
Before fftplan3d 6954MB free 
After fftplan 6954M

647 log file
--------------------

best fit for k0 is 0.070% from expected value.
moving centerband
Before fftplan3d 6954MB free 
After fftplan 6954MB free 
re-transforming centerband
inserting centerband
centerband assembly completed
mo
iandobbie commented 1 month ago

Also it doesn't seem to be a race condition as it appears that the truncations are the same each time the software is run.

thomasmfish commented 1 month ago

This is very odd... I've tried reordering how things are handled in #75, so could you try that branch and let me know if that changes anything? The other thing I was thinking is perhaps adding print(f"\n{"-" * 80}\ncudasirecon complete") to reconstruct_from_processing_info after reconstruct is called would give an idea of whether the problem is with the redirection overall or if it's just the cudasirecon output. I'm frustrated that I can't get it to happen on my end, though I haven't tried with the 4 channel data yet.

iandobbie commented 1 month ago

I'll try that branch and let you know what the results are. I agree this is very strange, especially in its reproducibility which makes me think it is definitely not a race condition issue.

thomasmfish commented 1 month ago

Each reconstruction runs on a different process, so the only thing in terms of a race condition I can think of is the flushing and syncing of the output, or perhaps some oddity to do with how the reconstruction process exits (though that should only be a problem if it crashes, so I don't think it's that). I wonder if a very short wait would make a difference, perhaps time.sleep(0.1) even... Let me know how that branch goes anyway and we can go from there!