Issue with Frame Cache Writing using Multiprocessing

HEXRD / hexrd

A cross-platform, open-source library for the analysis of X-ray diffraction data.

Other

55 stars 26 forks source link

Issue with Frame Cache Writing using Multiprocessing #608

Closed darrencpagan closed 7 months ago

darrencpagan commented 7 months ago

I do not think the multiprocessor mapping function in imageseries framecache writing option is properly waiting for all processes to finish the operation that generates the sparse matrices from the data.

We are trying to write frame caches from raw-image data with 3600 frames, however the output sparse matrices only have three frames. When I modified the code to run sequentially on a single processor, everything was able to work. I looked online and there is conflicting information regarding the implementation of the thread executors as to whether it waits for all processes to finish.

psavery commented 7 months ago

@darrencpagan I assume you mean these lines?

The threads are supposed to finish before the with block is exited. I just tried this with a 1440 frame example, and it appeared that all frames were written. Are you able to provide me an example script/data, as we might be doing something slightly different?

darrencpagan commented 7 months ago

Is there a best way to transfer a fairly large raw image (file) along with the scripts? I can provide on oneDrive or a cloud service. Maybe better for an email?

psavery commented 7 months ago

@darrencpagan I can access it on classe if you put it there (if you do that, send me a message on slack with the filepath). Otherwise, I think a oneDrive/Google drive link will be fine (and share it with my email)!

donald-e-boyce commented 7 months ago

I wrote a simple tester for the parallel frame cache writing. I'll include it below. It creates an imageseries with a lot of frames but small image shape, so that it runs fast. It writes it to a frame cache then reads it and compares to the original. I wasn't able to break it. It runs like this:

(hexrd-dev) (MBP: frame-cache-parallel) 1928. python test_fcp.py -nw 8
number of wokers:  8
saving file
comparing imageseries
- lengths match
- shapes match
- all frames match
compare: done
(hexrd-dev) (MBP: frame-cache-parallel) 1929.

test_fcp.py.txt

darrencpagan commented 7 months ago

I've shared a OneDrive with you both with the script and data that causes a problem. The frame-cache only saves 3 of 3601 frames.

Last missing piece of info is I'm working on a computer with 80 workers (40 cores operating 2 threads).

psavery commented 7 months ago

Thank you for providing the example. I was able to reproduce the issue and determine the cause.

Because this is a raw image series, it must be read in sequence. An exception was being raised because the indices in the imageseries were not being accessed in order (via the threadpool).

However, because we were not evaluating the results of the map(), the exception was not being propagated - so you wouldn't see it.

Running it serially fixed the issue because that ensured that the indices were being accessed in order.

We will add some logic to fix this issue for writing a frame cache from a raw image series. And we will also modify the code to ensure that if an exception occurs, it will be propagated so that it will be visible to the user.

Thank you for reporting this issue, @darrencpagan!

darrencpagan commented 7 months ago

I can see it now. Got it. Thanks for figuring out the issue. I'll keep an eye out for the fix.

psavery commented 7 months ago

@darrencpagan This is now fixed in the master branch (as of #611), and should be in the prerelease in about an hour.