Closed Mohinta2892 closed 9 months ago
Hello!
It seems that the OS killed your process most likely due to memory problems. You can try to set DATA.TRAIN.IN_MEMORY
to False
and DATA.EXTRACT_RANDOM_PATCH
to False
so the training data is not loaded in memory (but load images on the fly) and a random patch of the specified shape (DATA.PATCH_SIZE
) is extracted from each loaded image, respectively.
Furthermore, we are now working to move BiaPy from TF to Pytorch so we hope we could reduce a bit more the memory usage apart from other advantages.
Thank you for using BiaPy!
Best regards,
Dani
Hello!
I've just pushed some changes (commit: 27120f5a7) that reduce the memory usage in the workflows. It can make a huge difference in big datasets as MitoEM so maybe now you can train the model.
Best,
Dani
Hi Dani,
Many thanks, I will try with the latest pushed changes. And close the issue if all works well!
Best wishes, Samia
Hello again,
I've made a few more changes to save memory. I've set TEST.REDUCE_MEMORY
to True
, which means we'll use float16
instead of float32
for model predictions and some other data. This will help save memory, especially when working with large images like MitoEM. It might make predictions a bit less accurate, but I think that the impact shouldn't be noticeable.
Feel free to close the issue.
Best,
Dani
Thanks Dani!!
Best, Samia
Hi Daniel,
I am trying to train MitoEM via Biapy following your guidelines in the docs. However, the training gets killed at the below phase:
This is how I run it:
I am running it through a docker image that I have built myself and it can definitely access tensorflow gpus. I have a 24GB RTX 3090 and 128GB RAM. I also have reduced the batch_size during training from default 6 to 2.
Any clue what might be happening?
Best, Samia