Open nepomnyi opened 3 weeks ago
Thanks. it seems rather clear where to look for the problem. I am glad to see you found the main issue and can run your code for 9hrs the question is - what do you expect from the software to do when it recognizes a leak or a problematic pair - to drop it and continue? to raise an warning? to stop and wait for the solution?
Yes, I agree it is clear now where to look for the problem. But it was a mere chance I figured it out.
So, I think the first thing we have to do is to update windef
documentation suggesting to use either original PIV images without any masked regions or, if background is subtracted, make sure that the average background intensity is equal to the average image intensity before subtraction. This is a quick patch.
The second thing to do is to understand the mechanism of the the memory leak occurrence. And I am not sure how to do it. StackExchange has suggestions, but they are so technical. It would be great to have a real programmer for this task.
Third, like you say, we need to understand how to proceed. I think we need a normal solution here. For instance, if windef
can't get a vector for the current interrogation window, then it should assign NaN
to it instead of leaking memory. We also should give the user a warning that a memory leak could've occurred and that he must refine the pair of the PIV images (and give suggestions how to refine it). If we can't do all that, then dropping the image pair is next best solution, I think.
To summarize, the big question right now is how to understand where exactly memory leak occurs and how to intercept it.
It would be great if we see which line or lines are responsible. I guess it is a loop for multigrid but this is just my guess.
will take some time to look into it.
You see, when I run it, I can see it failing at the last pass of the multipass
(which operates on a 16 pix interrogation window with 0% overlap). There were cases when even the first pass of the multipass
crashed (which operates on a 32 pix interrogation window with 50% overlap). But the first_pass
always works (which operates on a 64 pix interrogation window with 50% overlap).
When I look at windef
algorithm, I can see it uses extended_search_area()
under the hood. The only piece concerning with the size of the interrogation window in extended_search_area()
is numpy
's fft
function. That's why I think the issue is on numpy
's side.
Please, create two or three different files with the names as you propose. I'm not sure I understand what you want me to replace by what.
Now, go to lines 597 and 598 of the code where I subtract the backgrounds. There, replace the argument
newBgrAave
with the argumentbgrAave
on the line 597 and the argumentnewBgrBave
with the argumentbgrBave
on the line 598. And you will see that PIV image pair number 554 crashes the code without an error. The argumentsnewBgrAave
andnewBgrBave
are the average backgrounds multiplied by a coefficient to make their average intensity equal to the average intensity of the row frames A and B. The argumentsbgrAave
andbgrBave
are just the average backgrounds.Then do the same, but use the argument
bgrAPIV
fornewBgrAave
on line 597 and the argumentbgrBPIV
fornewBgrBave
on line 598. Here the argumentsbgrAPIV
andbgrBPIV
are the brightest backgrounds for frames A and B. You'll see how PIV image pair number 1063 crushes the code.Then go ahead and comment out lines 625-630 and uncomment lines 615-620. On lines 625-630 I supply the original PIV images (of course with subtracted backgrounds) to my custom
windef
algorithm. Whereas on lines 615-620 I supply the images where bubbles are masked. You will see how PIV image pairs 246, 335, 336 crash the code.You can play with all that and see with your own eyes how wrong background subtraction can easily create a lot of "empty" spaces (those images are in the folder "windefImages").
I do think those "empty" space lead to memory leaks not in the OpenPIV package, but in the
numpy
fast Fourier transform functions that cannot find correlation peaks where there are not particles in the interrogation windows.
Done.
When the Google drive updates itself, you'll see 3 new .py
files:
For some reason, I couldn't reproduce the situation when PIV pairs number 246, 335, 336 crash the code. This is so surprising! Or maybe I forgot the exact configuration of the programme when it was crashed by pairs 246, 335, 336. Still, I left those PIV pairs in the folder so that you could compare how good PIV pairs are executed to what happens when a PIV pair crashes the code.
Also note that the function PIVanalysisWindefIndivid()
, lines 352-467 is where my custom windef
algorithm is located. Essentially, I just copied the function func()
which is located inside the function piv()
in windef
module of OpenPIV package.
I put the settings for windef
in a separate function PIVwindefSettings()
, lines196-350.
You can check the images that, after background subtraction, are supplied to the windef
algorithm in the folder PIVres -> temp -> windefImages.
This thing keeps stopping and stopping without any error... Much more seldom though. I think, I have traced the issue to the implementation of scipy.interpolate.RectBivariateSpline()
on lines 690 and 693 in windef
module. So, I am not blaming numpy
's fft anymore. I found an open issue on scipy
's github page about RectBivariateSpline
. The issue is not about memory leak, but still. I asked them if they have resolved the issue and they said no. I am not sure how to troubleshoot further.
On my machine, the problem is in np.all(flags) after validation. this is explainable that when the image is completely black, or in your case the background is identical to the image - all flags are ON and there's nothing to propagate to the next multi-pass level. I still think how to deal with it. It's a degenerate case, i.e. not normal. I think it's sufficient to stop the execution. Actually this test raises an error, but probably the multiprocessing is covering it up and just stops.
Have you checked they all were, actually, ON? If so, the fix is easy: an if
statement which catches this thing at the current step of the multipass and, if True
keeps the previous step of the multipass, gives the user a warning and moves on to the next image pair without stopping the execution.
Although in my recent cases, I am pretty sure, the culprit is RectBivariateSpline()
.
Have you checked they all were, actually, ON? If so, the fix is easy: an
if
statement which catches this thing at the current step of the multipass and, ifTrue
keeps the previous step of the multipass, gives the user a warning and moves on to the next image pair without stopping the execution. Although in my recent cases, I am pretty sure, the culprit isRectBivariateSpline()
.
I use debugger. It stops on the specific line. I guess that there's also an option that two different errors occur in different situations. I didn't try all the options, first I need to solve this one.
Hi all, sorry for not jumping in early on.
RectBivariateSpline seems to be the cause of this issue, as other have stated. This issue may stem from how RectBivariateSpline operates. For the splines to properly form and evaluate, there needs to be a minimum of a 3x3 regular grid of finite values for low interpolation orders (3 and lower) and a 5x5 grid of regular finite value for higher interpolation orders (4 and up). Another user posted a similar issue that occurred some time ago caused from not enough interrogation windows in the grid, which emitted an error when evaluating the spline (#228). I wish I could help further, but I will be quite busy until February, which may delay further responses and maintenance.
@nepomnyi, please check the scripts you've sent for the test on your machine. I think there is no "special bug" or "memory leak" but the script fails due to somewhat different conditions in the background subtracted that leads to some region with a lot of NaNs. This region is then should be covered by replace_outliers
which does inpainting of NaNs, but it fails in some cases to cover everything. Then, indeed we get NaN inside BivariateSpline and everything becomes NaNs and then we're doomed.
I suggest to check if this change in settings solves the problem
windefSettings.max_filter_iteration = 5 # was 3
It does solve problem for me and all the scripts run well.
There is also one problem that I think does not relate to the script of openpiv but rather the way pairs of files are chosen. I had to replace on Linux machine the line islice(pairwise(...
cause it gave me pairs not in sorted way so the pairs were not a,b chosen. I use very simple way:
rom natsort import natsorted
image_list = rawPIV.glob('*.jpg')
image_list = natsorted(image_list, key=str)
# print(image_list)
# Create pairs of files from the image_list
pairs = [(image_list[i], image_list[i + 1]) for i in range(0, len(image_list) - 1, 2)]
print(pairs)
# Or convert list of paths to list of string and (naturally)sort it, then convert back to list of paths
# image_list = [Path(p) for p in natsorted([str(p) for p in image_list ])]
# for pair in islice(pairwise(rawPIV.iterdir()),None,None,2):
for pair in pairs:
print(str(pair[0]), str(pair[1]))
If this indeed solves the problem, we still need to think about how to protect our users from failing with this case - I think the warning would be appropriate, but we need to find out how to set it correctly and where.
Thank you, Dr. Liberzon. I will increase the number of iterations to 5 and report how it goes. So far, I have "solved" the problem by increasing the threshold for the median filter to 1.3 from 1.1. Since I have done that, I haven't experienced sudden execution interruptions. I think this goes in the same vein: increasing the median thresholds significantly reduces the likelihood of getting all NaNs.
@alexlib, does it matter if I check your suggestion windefSettings.max_filter_iteration = 5 # was 3
on a different failing image pair?
If so, after I introduced the changes I mentioned in the message above (loosened threshold for the median filter), my code have been running on two different computers simultaneously for about 2 weeks without any interruptions. And just now, the code has stopped on one of the computers without an error message again. For the failing pair, I tried your suggestion windefSettings.max_filter_iteration = 5 # was 3
and it worked! So, I can confirm: it is a possible solution.
If you do, in fact, want me to check it on those image pairs that I sent you, I can do that (but later).
Description I am running a
windef
-based code for a banch of 1636 images. The code suddenly stops at a random image and doesn't show any error message.Details This has been a persistent issue. These are my findings (briefly).
1) StackExchange suggests it can be a memory leak. Most often memory leaks are due to regular expressions which I use in my own code and which are used in
windef
functions. I "turned off" PIV processing part of my code and ran it with all my regular expressions - didn't have any problems. I concluded it wasn't my regular expressions that lead to the memory leak.2) I decided the next most likely trouble-maker is
multiprocessor
. I decided to try and runwindef
withoutmultiprocessor
. I copiedfunc()
function from thepiv()
function ofwindef
module and assembled a customwindef
algorithm that runs in a good oldfor
loop withoutmultiprocessor
. And I got the same problem - the code crashed at the same PIV image pair.3) Then I noticed that I supplied PIV images with masked bubbles to my custom
windef
algorithm. I went ahead and the supplied the original PIV images (where the bubbles were not masked). And I had success! I concluded that, most likely, the correlation-finding algorithm ofwindef.multipass_img_deform()
didn't like "empty" spaces in PIV images and, somehow, those empty spaces result in memory leak. By "empty" spaces, I of course mean the masked regions of the PIV images.4) So, I supplied the non-masked (original) PIV image to my custom
windef
algorithm and my code could process the problematic PIV image pair (it was PIV image pair number 246). But then it stopped again at PIV image pair number 1063 without any error. I subtract background from my PIV images before supplying them to my algorithm. The background of my PIV images was varying over time (I used LED backlight that created time-varying lighting). I collected background images and picked the brightest to subtract from my PIV images. When I had a look at the PIV images with subtracted backgrounds, I noticed that pair 1063 had very "rarefied" PIV image after background subtraction. I.e., there were a lot of "empty" spaces. So, instead of subtracting the brightest background, I subtracted the average background. And I had success! PIV image pair number 1063 got processed.5) But then, PIV image pair number 554 from a subsequent batch of 1636 images crashed the code. And again I noticed to many "empty" spaced after subtraction of the average background for this particular PIV image pair. So, I took the average background, multiplied it by a coefficient so as to bring its average intensity to the average intensity of the original PIV image from which I am subtracting it. And then I subtracted it. And I had success! The code has been running for more than 9 hours without a crash so far.
6) Based on that, I concluded that when
windef.multipass_img_deform()
can't find enough PIV particles within an interrogation window, a memory leaks occurs. That's where I stopped.How to reproduce Here's the link to my Google drive where I am sharing my code and images. Please note, this is not, yet, a published code and images. I plan to publish them soon. So, please, use it solely for the purposes of finding the bug in
windef
and then delete it from your computers.Following the link, you'll find a folder. Its structure is the following. Folder "backgrounds" contains 100 backgrounds for frames A and B. Folder "images" contains all the PIV pairs that crashed my code. Folder "PIVres" has the folder "temp". The folder "temp" has the following folders. "Backgrounds" folder: 1) its subfolder "PIV" contains the brightest backgrounds for frames A and B, 2) its subfolder "shadow" contains the dimmest backgrounds for frames A and B, 3) its subfolder "average" contains the average backgrounds for frames A and B. "windefBubbleMasks" folder contains bubble masks. "windefImages" contains PIV images with subtracted backgrounds. "windefResults" folder contains folder "rawvLtxtwindef" folder with the OpenPIV txt files right after the
windef
step and the same OpenPIV txt files but with bubble masks.The code itself is in the root and is called "memoryLeak.py".
Copy the folder onto your computer. Navigate into the folder. Run the code from within the folder. You'll see no issues with the code.
Now, go to lines 597 and 598 of the code where I subtract the backgrounds. There, replace the argument
newBgrAave
with the argumentbgrAave
on the line 597 and the argumentnewBgrBave
with the argumentbgrBave
on the line 598. And you will see that PIV image pair number 554 crashes the code without an error. The argumentsnewBgrAave
andnewBgrBave
are the average backgrounds multiplied by a coefficient to make their average intensity equal to the average intensity of the row frames A and B. The argumentsbgrAave
andbgrBave
are just the average backgrounds.Then do the same, but use the argument
bgrAPIV
fornewBgrAave
on line 597 and the argumentbgrBPIV
fornewBgrBave
on line 598. Here the argumentsbgrAPIV
andbgrBPIV
are the brightest backgrounds for frames A and B. You'll see how PIV image pair number 1063 crushes the code.Then go ahead and comment out lines 625-630 and uncomment lines 615-620. On lines 625-630 I supply the original PIV images (of course with subtracted backgrounds) to my custom
windef
algorithm. Whereas on lines 615-620 I supply the images where bubbles are masked. You will see how PIV image pairs 246, 335, 336 crash the code.You can play with all that and see with your own eyes how wrong background subtraction can easily create a lot of "empty" spaces (those images are in the folder "windefImages").
I do think those "empty" space lead to memory leaks not in the OpenPIV package, but in the
numpy
fast Fourier transform functions that cannot find correlation peaks where there are not particles in the interrogation windows.