Closed hamid13r closed 1 year ago
That's indeeed strange, because nothing big changed since 1.8.0.
Do you run both on the same machines? Is the 1.8.4 with CUDA 10 or CUDA 11? Would you be able to share the training set so that I can try to reproduce it?
Best, Thorsten
No, they are not on the same machine. One has larger GPU ram which I moved onto, to be able to analyze K3 data. Both have the line
2022-09-27 10:26:43.593413: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
So I think both use CYDA10.
I can share the data, but it is ~20G of tomograms... We are trying 1.8.2 on the same machine as we speak...
Best, Hamid
What graphic card has the new computer with the bigger gpu?
27.09.2022 19:39:48 Hamidreza Rahmani @.***>:
No, they are not on the same machine. One has larger GPU ram which I moved onto, to be able to analyze K3 data. Both have the line 2022-09-27 10:26:43.593413: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 So I think both use CYDA10.
I can share the data, but it is ~20G of tomograms... We are trying 1.8.2 on the same machine as we speak...
Best, Hamid
— Reply to this email directly, view it on GitHub[https://github.com/MPI-Dortmund/cryolo/issues/11#issuecomment-1259839873], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AAIP66OF6PL3ALLQDTJI7WDWAMWOHANCNFSM6AAAAAAQWEDWAE]. You are receiving this because you commented.[Verfolgungsbild][https://github.com/notifications/beacon/AAIP66NCLQ2JBGTYHAWYTRDWAMWOHA5CNFSM6AAAAAAQWEDWAGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSLC6QYC.gif]
RTXa5000 and rtxa6000
I'm pretty sure that both graphic cards are not compatible with cuda 10. Can you try the cuda 11 setup from our documentation?
27.09.2022 20:17:59 Hamidreza Rahmani @.***>:
RTXa5000 and rtxa6000
— Reply to this email directly, view it on GitHub[https://github.com/MPI-Dortmund/cryolo/issues/11#issuecomment-1259882760], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AAIP66P2CWZVNPNGY3ADE6TWAM25LANCNFSM6AAAAAAQWEDWAE]. You are receiving this because you commented.[Verfolgungsbild][https://github.com/notifications/beacon/AAIP66NGONLNDIJJVGYVWLLWAM25LA5CNFSM6AAAAAAQWEDWAGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSLDBEQQ.gif]
I checked again, it was actually cuda11. Switching to another partition that has Geforce GTX1080Ti GPUs solved this problem.
Except, Geforce GTX1080Tis do not have enough memory to hand K3 data without reducing the batch size.
Forgot to close this. Good you solved this. For K3 data and no-too-small particles it is also an alternative to change the input size to something like 768.
Hi, we just started using cryolo-1.8.4 and I am running into a weird problem. I have two identical directory (same config files, same tomos, and same train_annot directory) and I get 44K particles when I use cryolo1.8.0 and with cryolo1.8.4 the model does not get trained and I get:
val_loss did not improve from inf
swapping between different denoising methods gets me at most 600 particles with 1.8.4.
Any idea what could be the problem?
here is my config file:
{ "model": { "architecture": "PhosaurusNet", "input_size": 1024, "anchors": [ 25, 25 ], "max_box_per_image": 700, "norm": "STANDARD", "filter": [ 0.1, "filtered_tmp/" ] }, "train": { "train_image_folder": "/gpfs/group/grotjahn/hrahmani/ecoli_project/03_WT_19/02_pciking/2_cyolo_deconv/images", "train_annot_folder": "/gpfs/group/grotjahn/hrahmani/ecoli_project/03_WT_19/02_pciking/2_cyolo_deconv/train_annot", "train_times": 10, "pretrained_weights": "", "batch_size": 4, "learning_rate": 0.0001, "nb_epoch": 200, "object_scale": 5.0, "no_object_scale": 1.0, "coord_scale": 1.0, "class_scale": 1.0, "saved_weights_name": "cryolo_model_deconv.h5", "debug": true }, "valid": { "valid_image_folder": "", "valid_annot_folder": "", "valid_times": 1 }, "other": { "log_path": "logs/" } }