Closed ErichZimmer closed 2 years ago
Is the repository public?
see my fork - I invited you https://github.com/alexlib/Open_PIV_mapping
That repository is much different then my attempt which uses a meshed region of interest and for-loops to find the points. As soon as I get a decent internet connection, I'll hopefully get everything pushed to a fork or repository for everyone to see (including my spaghetti-coding skills :P).
Just did some tests and your fork/repository is quite better and more robust then my implementation. I'll see if there is some enhancements/refactoring I can do. Does this repository allow for the calibration of vectors as a post-processing method?
It is Theo work in progress, he have chosen to work in the image space. But it should work on vectors as well
I played around with different ideas and keep reverting back to Theo's repository. The calibration seems pretty simple and would be a nice addition for OpenPIV. On another note, I tested rectangular windows with the GUI and it works like a charm except that it's 50% slower than square windows. Here is a screenshot of raw vectors using circular correlation.
The image pair is from PIV Challenge 2014 case A (testing micro-PIV).
To avoid major overhead with shared dictionaries, the files are stored in a temporary folder before loaded into the GUI and deleted. This makes multiprocessing as fast as the simple GUI and removes the need for a batch size. Is this method alright? New processing steps:
I am not quite sure about the step of saving to npz and then loading to hdf5 - could it be maybe stored already in hdf5, to save one conversion or loading/saving step?
H5py doesn't directly support parallel writing, so it's this wierd work around or the other one based on a shared memory dictionary that is then loaded into h5py. I am still looking for better options through mpi4py, but so far, it isn't successful and complicates the installation process of the GUI. In my opinion, this issue is one of the few problems with h5py where other libraries (Ex: not using h5py like in the simple GUI) would be better.
I understand. So there are two options: a) use mutiprocessing and RAM - to keep all the parallel results in memory, b) store every result by a separate worker to a temporary file and then combine them. I guess if there is a significant speed-up in the option b) compared to a single thread / single processing path - let's do it this way.
take a look at zarr
https://github.com/pydata/xarray/issues/3096
https://github.com/pydata/xarray/pull/4035
https://zarr.readthedocs.io/en/stable/tutorial.html
can it help? it seems to have some solution and it's pip-installable.
take a look at zarr
I looked at it and it seems promising and easy to implement with minimal change in code.
on calibration
I got somewhat familiar with the image calibration interface and I like it so far. However, an improvement in precision can be attained by using a centroid algorithm and find_first_peak
/find_second_peak
in pyprocess
.
My version of the calibration software follows the instructions of an article mentioned previously and ignores scaling to minimize user input. It is based off of Theo's script and Fluere, and can only be applied to the vector field via for loop. I still like Theo's script more, though, as it is more flexible.
It would be great to incorporate the script into something like OpenPIV.tools or its own calibration file as some cameras (e.g. my raspberry pi controlled 1 Mp global shutter sensor) have quite a fisheye distortion and messes up the measurements.
The subpixel function works for the original script, so I'll simply use the original script by Theo.
It would be great to incorporate the script into something like OpenPIV.tools or its own calibration file as some cameras (e.g. my raspberry pi controlled 1 Mp global shutter sensor) have quite a fisheye distortion and messes up the measurements.
Good idea. Move the discussion to openpiv-Python repo issues please
Zarr is creating a file for each frame, so I'll have to figure out what I'm doing wrong here. It does allow multiprocessing though ;)
Using npy files wasn't a smart decision. They save and load fast, but the individual file sizes can get up to 3 MB for 50,000 vectors. For large sessions, this uses up quite a bit of space before it is deleted. Zarr is still making a bunch of files and in a way, acts like the temporary npy files. I'll try mpi4py again for built in parallel with h5py. Additionally, h5py files can get quite large, with some being >20 GB for large processing sessions. However, a similar amount of space is taken by text files.
Using a batch system similar to the shared memory dictionary system, the results can be processed in parallel and loaded in serial. If we are to use this system, then Zarr might be a good file system to use as it operates in a very similar fashion with multiple linked files.
It also allows for exporting the session in HDF5 and netCDF.
I found that the temporary file system works best, so I'll keep it to now. It doesn't take any extra space on the hard drive.
Here is the somewhat buggy h5py gui. https://github.com/ErichZimmer/openpiv_tk_gui/tree/GUI_enhancement2
It requires h5py as an extra dependency.
To not pollute your GUI with features that cannot be merged (at least I wasn't able to due to my basic programming knowledge), I'm going to close this issue so I can focus more on your GUI.
I also moved the h5py GUI to a new repository to eliminate accidentally pushing the wrong GUI to my fork of your GUI. https://github.com/ErichZimmer/openpiv-python-gui
I honestly like your GUI a little more because of its simplicity.
Background/Issue
The current GUI stores data in separate files which can make it hard to do more thorough data processing. To combat this, an already suggested solution was to store all results in a single dictionary of the dataset and export the results in a manner the user deems sufficient. However, on large processing sessions (>60,000 images), the GUI can become quite slow especially on lower performing laptops. Furthermore, the performance of the GUI starts to decrease with these large sessions. This can cause a disruption to efficient work flows and an increase in glitches (mostly applies lower performing computers).
Proposed solution
After exploring different ways of storing huge amounts of data, H5Py was found to perform pretty well even on underperforming computers (e.g. my laptop 😢). When properly configured, most data is stored in the hard drive, leaving RAM mostly unused unlike dictionary style designs. Additionally, the structuring of an HDF5 file makes it very simple to load specific sections of data/results which has its advantages. Taking advantage of these features, the HDF5 file is structured like the following;
Possible downfalls
PS, I'm back 😁 (got medically discharged from an injury) and ready to relearn everything/hopefully not be so ill-informed on testing methods like I was back then -_-. Additionally, your inputs on using HDF5 or others for storage would be helpful for further research and designing.