Open yuehuang2023 opened 3 weeks ago
Yes box size of 360 px is hardcoded at multiple places in the code:
I couldn't find this mentioned in the Nature Methods paper and as @yuehuang2023 points out one of the example datasets used a box size of 380 px. @schwabjohannes, @scheres - why is 360 px hardcoded as a limit? The message:
If you want the original box size for the output volumes use a bigger gpu
seems a bit disingenuous when 360 px appears to be a hard-coded limit?
I also encountered the same message when running on one of my datasets with 384 px box.
Please, don't call something disingenuous so carelessly. A simple look at the code shows that 360px is not hardcoded as a limit. What is coded is an automated down-scaling in case the size goes above 360. This will be triggered with the example dataset.
The error below should only be raised when an exception is encountered, supposedly when you run out of GPU memory. Perhaps you can try as suggested and run on a bigger GPU?
On 11/5/24 09:21, Huw Jenkins wrote:
CAUTION: This email originated from outside of the LMB: @.** Do not click links or open attachments unless you recognize the sender and know the content is safe. If you think this is a phishing email, please forward it to **@.***
--
Yes box size of 360 px is hardcoded at multiple places in the code:
I couldn't find this mentioned in the Nature Methods paper and as @yuehuang2023 https://github.com/yuehuang2023 points out one of the example datasets used a box size of 380 px. @schwabjohannes https://github.com/schwabjohannes, @scheres https://github.com/scheres - why is 360 px hardcoded as a limit? The message:
|If you want the original box size for the output volumes use a bigger gpu |
seems a bit disingenuous when 360 px appears to be a hard-coded limit?
I also encountered the same message when running on one of my datasets with 384 px box.
— Reply to this email directly, view it on GitHub https://github.com/3dem/DynaMight/issues/13#issuecomment-2456646700, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOHJCP37LSZ7YFPHOSEHU3Z7CEZPAVCNFSM6AAAAABP6F5RV6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJWGY2DMNZQGA. You are receiving this because you were mentioned.Message ID: @.***>
-- Sjors Scheres MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge Biomedical Campus Cambridge CB2 0QH, U.K. tel: +44 (0)1223 267061 http://www2.mrc-lmb.cam.ac.uk/groups/scheres
Yes you are correct the message is triggered by running out of GPU memory. Sorry I should have looked more carefully. I was running on an A40 with 48 GB which I thought was quite a big GPU!
However, the volume will still be downscaled by 2 with a 384 px box. Should I crop the particles to 360 px?
I used the GPU A6000 with the same configuration mentioned in the supplementary, but this error was still raised.
I got DynaMight running on a H100 and with my dataset (384 px box) I got the same errors:
box size: 384 pixel_size: 0.825 virtual pixel_size: 0.0025974025974025974 dimension of latent space: 6
Number of used gaussians: 10000
Optimizing scale only
volume too large: change size of output volumes. (If you want the original box size for the output volumes use a bigger gpu. The size of tensor a (384) must match the size of tensor b (192) at non-singleton dimension 2
and
/xxx/miniforge/envs/relion-5.0/lib/python3.10/site-packages/dynamight/models/decoder.py:235: UserWarning: Using a target size (torch.Size([192, 192, 192])) that is different to the input size (torch.Size([384, 384, 384])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
loss = torch.nn.functional.mse_loss(
As I don't have access to a bigger gpu I made the following change:
--- decoder.py.orig 2024-11-06 09:02:03.000000000 +0000
+++ decoder.py 2024-11-06 09:02:26.000000000 +0000
@@ -224,7 +224,7 @@
print('Optimizing scale only')
optimizer = torch.optim.Adam(
[self.image_smoother.A], lr=100*lr)
- if reference_volume.shape[-1] > 360:
+ if reference_volume.shape[-1] > 384:
reference_volume = torch.nn.functional.avg_pool3d(
reference_volume.unsqueeze(0).unsqueeze(0), 2)
reference_volume = reference_volume.squeeze()
and the errors went away. I think my earlier apology was premature.
The job with the modified dynamight/models/decoder.py
is still running and is currently using ~21 GB of the 80 GB on the H100 GPU.
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA H100 PCIe Off | 00000000:21:00.0 Off | 0 |
| N/A 74C P0 221W / 310W | 21301MiB / 81559MiB | 79% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
So I believe the underlying bug is the failure to update self.vol_box
around here:
--- dynamight/models/decoder.py.orig 2024-11-06 09:02:03.000000000 +0000
+++ dynamight/models/decoder.py 2024-11-06 16:35:26.000000000 +0000
@@ -228,6 +228,7 @@
reference_volume = torch.nn.functional.avg_pool3d(
reference_volume.unsqueeze(0).unsqueeze(0), 2)
reference_volume = reference_volume.squeeze()
+ self.vol_box//=2
for i in range(n_epochs):
optimizer.zero_grad()
which is then used in generate_consensus_volume()
here:
However, I don't think that this is the most optimal way to deal with large boxes. If DynaMight has a cliff edge limit of 360 px then this should be documented and users advised to crop/downscale their particles appropriately. I could easily trim 12px from the edges of my particle boxes and other users with > 360 px boxes might also prefer to downsample to this size over automatic 2x downsampling?
Hi, I tried to reproduce the results of EMPIAR-10073 with GPU A6000 and set the parameters according to the supplementary. However, Relion reports errors. Any suggestions for solving this error? Thank you.
The run log is
The run error is