Open sebseager opened 3 years ago
Hi Seb, I just mirrored 10017 (the source data for EMDB-2824) and am trying a few things locally to answer your questions.
I believe any configuration of sizes/margins etc is all with respect to pixels. I'm afraid I personally don't have the background to recommend any scientific parameters for particle picking at this time, but I may be able to help with your other concern.
Using the APPLE defaults I'm consuming only around 4GB or so. I will retry with your larger particle size settings. There may be some configuration we can do to reduce the memory footprint, but I need to test it and get back to you.
Thanks Garrett
Hi Seb, I was not able to reproduce the memory footprint you reported yet. I tailed the process with particle size 180. and it appears to use 5GB resident memory for me. Are you sure the machine is otherwise unloaded?
I am on OSX right now, but I will repeat on a linux machine as well when I get the chance.
To be sure we're doing something similar, how are you setting the particle size (config.ini?).
Thanks
I appreciate it. I'm still getting the bus error (below)
021-10-29 15:42:00,905 INFO Classifier model desired = svm
2021-10-29 15:42:00,908 INFO Using SVM Classifier
2021-10-29 15:42:00,908 INFO Computing scores for query images
2021-10-29 15:42:00,908 INFO Extracting query images
2021-10-29 15:42:00,992 INFO Extracting query images complete
100%|██████████| 196/196 [01:38<00:00, 1.98it/s]2021-10-29 15:43:48,957 INFO Running svm with tau1=144, tau2=1585
2021-10-29 15:43:56,591 INFO Discarding suspected artifacts
/var/spool/slurmd/job18225176/slurm_script: line 15: 32025 Bus error
with the following parameters
particle_size 176
max_particle_size 264
min_particle_size 44
minimum_overlap_amount 17
query_image_size 116
tau1 144
tau2 1585
Is this similar to what you did to achieve 5GB memory usage? Thanks for your help.
Thanks for your config. Confirmed that used over 30 GB for me. If you are on a slurm cluster try requesting 32GB (or 48GB) for this configuration.
I will try to narrow this down further.
Thanks. For this dataset, would you mind sharing the config.ini parameters you recommend?
As I mentioned before, I don't personally have the background for scientific questions regarding the APPLE component settings, but I can help you narrow down which settings are increasing the memory usage.
I see in this paper some remarks about parameters https://arxiv.org/pdf/1802.00469.pdf that might help you. They also claim it was performed with a 16GB machine. I expect that was the MATLAB code so it might have had different performance characteristics from the Python code. @ayeletheimowitz might know more about specific configuration settings for that dataset.
Hi Seb, I spent the morning in a debugger narrowing this down. The memory consumption appears to be driven mainly by the max_particle_size
. Specifically, there is the binary_erosion()
call relating to segmentation_o
which is particularly sensitive to the max_particle_size
. For example reducing from 264 to 200 drastically reduced the memory footprint.
I can't speak to the algo performance or accuracy related to changing that value.
Hope that helps.
Is this something we should be checking? More importantly, is there some alternative to binary_erosion
that is less memory-intensive?
Of course we can sanity check the params, but it's not clear that the settings are unreasonable here. If I was to hazard a guess, Seb is probably just trying to reproduce the result from the paper and making some educated guesses here. (He did try to ask us, I just don't personally know what the params were...)
Regarding morphology, there are other packages we could consider (eg skimage) if we are willing to bring in more dependencies. Based on this thread, we'd need to budget significant time for that testing, comparison, etc.
Hi Garrett,
I am working with Seb to improve the performance of the apple-picker model.
Thank you for sending the paper (https://arxiv.org/pdf/1802.00469.pdf). We now downsample micrographs via EMAN2 and use smaller particle box sizes before picking to achieve reasonable performance from the model. Micrograph downsampling addresses the bus error/memory issues, and smaller box sizes improve model performance.
There are two bugs that Seb and I have found in the code that we want to make you aware of:
1) The current modulus splicing in helper.py lines 42-44 and 71-73 will set blocks
to an empty array if the block_size
is a factor of either img.shape
element. We suggest changing the splicing to something like this: blocks = img[:(img.shape[0] // block_size) * block_size,:(img.shape[1] // block_size) * block_size]
.
2) To prevent errors when no particles are detected in a micrograph, try
/except
statements should be added to picking.py line 383. In addition, any conditional statements that are applied to centers
returned from picking.py should be adjusted in apple.py lines 167-182.
Thanks again for your help!
Thanks for the suggestions, I'd be happy to add those improvements.
Of course we can sanity check the params, but it's not clear that the settings are unreasonable here.
Right. At this point, all we can do is add a note to the documentation.
Regarding morphology, there are other packages we could consider (eg skimage) if we are willing to bring in more dependencies. Based on this thread, we'd need to budget significant time for that testing, comparison, etc.
Yeah it's just strange to me that this operation has such high memory requirements. From what I remember, this is just a post-processing step in the pipeline, so it's not something we expect to dominate.
It is not immediately clear what parameters can be used to reproduce the paper results (or close to it). I have personal issue with this, but I'm not in the best position to spend time backing that out brutishly. This is something that would make a tasty "experiment" in our gallery...
I don't know why this morphology algo uses so much ram, but it might be something another implementation does not... Ayelet emailed me that we could use something else (implying roll our own). Unfortunately I get the impression the erosion with a certain mask is the exact thing we would be doing...
Agreed. So I suggest we document and either keep this issue around or make a new one. When Ayelet gets back, we can ask her to take a look.
Actually, we could just subsample the result of our classification. Then we can use a smaller mask size for the erosion and save runtime and memory
Right, but isn't there still a risk that the erosion will explode in the same way (albeit with lower total memory)?
The size of the erosion mask is based on the particle size. We could use the particle size to determine the downsampling. This should prevent high memory consumption
Ok that sounds reasonable. Do you want to try it out?
Definitely
:pray:
For configuring APPLE picker, are there any guidelines for setting the particle size? Should the value be in Angstroms or pixels, and is it a radius or aperture (diameter)? For example, if we analyze the EMDB-2824 dataset, is it reasonable to set particle size to 180 or 318.6 (pixels vs. Angstroms)?
When I set particle size to 180 with 32GB memory available, APPLE picker encounters a BUS error (not enough memory), but when I decrease it to around 70, it runs successfully.
Thank you!