MPI-Dortmund / cryolo

cryolo documentation
8 stars 0 forks source link

Cryolo 1.8.4 vs 1.8.0 #11

Closed hamid13r closed 1 year ago

hamid13r commented 1 year ago

Hi, we just started using cryolo-1.8.4 and I am running into a weird problem. I have two identical directory (same config files, same tomos, and same train_annot directory) and I get 44K particles when I use cryolo1.8.0 and with cryolo1.8.4 the model does not get trained and I get: val_loss did not improve from inf

swapping between different denoising methods gets me at most 600 particles with 1.8.4.

Any idea what could be the problem?

here is my config file: { "model": { "architecture": "PhosaurusNet", "input_size": 1024, "anchors": [ 25, 25 ], "max_box_per_image": 700, "norm": "STANDARD", "filter": [ 0.1, "filtered_tmp/" ] }, "train": { "train_image_folder": "/gpfs/group/grotjahn/hrahmani/ecoli_project/03_WT_19/02_pciking/2_cyolo_deconv/images", "train_annot_folder": "/gpfs/group/grotjahn/hrahmani/ecoli_project/03_WT_19/02_pciking/2_cyolo_deconv/train_annot", "train_times": 10, "pretrained_weights": "", "batch_size": 4, "learning_rate": 0.0001, "nb_epoch": 200, "object_scale": 5.0, "no_object_scale": 1.0, "coord_scale": 1.0, "class_scale": 1.0, "saved_weights_name": "cryolo_model_deconv.h5", "debug": true }, "valid": { "valid_image_folder": "", "valid_annot_folder": "", "valid_times": 1 }, "other": { "log_path": "logs/" } }

thorstenwagner commented 1 year ago

That's indeeed strange, because nothing big changed since 1.8.0.

Do you run both on the same machines? Is the 1.8.4 with CUDA 10 or CUDA 11? Would you be able to share the training set so that I can try to reproduce it?

Best, Thorsten

hamid13r commented 1 year ago

No, they are not on the same machine. One has larger GPU ram which I moved onto, to be able to analyze K3 data. Both have the line 2022-09-27 10:26:43.593413: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 So I think both use CYDA10.

I can share the data, but it is ~20G of tomograms... We are trying 1.8.2 on the same machine as we speak...

Best, Hamid

thorstenwagner commented 1 year ago

What graphic card has the new computer with the bigger gpu?

27.09.2022 19:39:48 Hamidreza Rahmani @.***>:

No, they are not on the same machine. One has larger GPU ram which I moved onto, to be able to analyze K3 data. Both have the line 2022-09-27 10:26:43.593413: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 So I think both use CYDA10.

I can share the data, but it is ~20G of tomograms... We are trying 1.8.2 on the same machine as we speak...

Best, Hamid

— Reply to this email directly, view it on GitHub[https://github.com/MPI-Dortmund/cryolo/issues/11#issuecomment-1259839873], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AAIP66OF6PL3ALLQDTJI7WDWAMWOHANCNFSM6AAAAAAQWEDWAE]. You are receiving this because you commented.[Verfolgungsbild][https://github.com/notifications/beacon/AAIP66NCLQ2JBGTYHAWYTRDWAMWOHA5CNFSM6AAAAAAQWEDWAGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSLC6QYC.gif]

hamid13r commented 1 year ago

RTXa5000 and rtxa6000

thorstenwagner commented 1 year ago

I'm pretty sure that both graphic cards are not compatible with cuda 10. Can you try the cuda 11 setup from our documentation?

27.09.2022 20:17:59 Hamidreza Rahmani @.***>:

RTXa5000 and rtxa6000

— Reply to this email directly, view it on GitHub[https://github.com/MPI-Dortmund/cryolo/issues/11#issuecomment-1259882760], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AAIP66P2CWZVNPNGY3ADE6TWAM25LANCNFSM6AAAAAAQWEDWAE]. You are receiving this because you commented.[Verfolgungsbild][https://github.com/notifications/beacon/AAIP66NGONLNDIJJVGYVWLLWAM25LA5CNFSM6AAAAAAQWEDWAGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSLDBEQQ.gif]

hamid13r commented 1 year ago

I checked again, it was actually cuda11. Switching to another partition that has Geforce GTX1080Ti GPUs solved this problem.

Except, Geforce GTX1080Tis do not have enough memory to hand K3 data without reducing the batch size.

thorstenwagner commented 1 year ago

Forgot to close this. Good you solved this. For K3 data and no-too-small particles it is also an alternative to change the input size to something like 768.