classifier training error

0Godness commented 1 year ago

creating data loader... dataset is chexpert creating optimizer... training classifier model... step 0 classnames ['diseased', 'healthy', 'diseased', 'diseased', 'diseased', 'healthy', 'healthy', 'healthy'] len 8 len 8 len 8 lenloader 8 len 8 path patient01104_study1 path patient00485_study2 path patient00124_study2 IS CHEXPERT labels tensor([1], device='cuda:0') h1 torch.Size([1, 8192]) Traceback (most recent call last): File "/home/Diffusion Models/diffusion-anomaly/scripts/classifier_train.py", line 191, in main losses = forward_backward_log(data, step + resume_step) File "/home/Diffusion Models/diffusion-anomaly/scripts/classifier_train.py", line 138, in forward_backward_log logits = model(sub_batch, timesteps=sub_t) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/syg/Diffusion Models/diffusion-anomaly/./guided_diffusion/unet.py", line 955, in forward return self.out(h) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, **kwargs) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/linear.py", line 103, in forward return F.linear(input, self.weight, self.bias) File "/usr/local/lib/python3.9/dist-packages/torch/nn/functional.py", line 1848, in linear return torch._C._nn.linear(input, weight, bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x8192 and 256x2)

I just use your original code and 'data' to train the classifier, but i got these errors. Could you please help to solve the error? And the class name is 8 which is inconsistent to the paper setting 2. I dont know whether I run it using wrong setting. Hoping for your detailed answers! Thank you!

JuliaWolleb commented 1 year ago

Hi How did you set the path to your training data? It should go to data/chexpert/training. Then, the length of the class names should be 6 instead of 8, because you have only 6 images in your training set in this mini-example. You still have only 2 classes, which is 'healthy' and 'diseased', so that should work fine.

Regarding your error, I made a change to the file _guideddiffusion/unet.py. Let me know whether this fixed the issue.

Best, Julia

0Godness commented 1 year ago

Thanks for your kind answers. Yes, it works! So the one .npy file represents one image? Could you provide the transfer code and I want to use custom dataset. Thanks again!

JuliaWolleb commented 1 year ago

yes, one .npy file represents one image. Would maybe be a better idea to store them in another format. What do you mean by transfer code?

zideliu commented 1 year ago

yes, one .npy file represents one image. Would maybe be a better idea to store them in another format. What do you mean by transfer code?

transfer cheXpert dataset to .npy?

And if possible, could you provide the code for processing the brasts dataset? The format of the dataset I downloaded is not the same as the format required by the code

ZZZGGGG commented 1 year ago

Hi How did you set the path to your training data? It should go to data/chexpert/training. Then, the length of the class names should be 6 instead of 8, because you have only 6 images in your training set in this mini-example. You still have only 2 classes, which is 'healthy' and 'diseased', so that should work fine.

Regarding your error, I made a change to the file _guideddiffusion/unet.py. Let me know whether this fixed the issue.

Best, Julia

Hello Professor, I also have such a question. When I am training the classifier module, the loss will not converge and fluctuate very strangely. Is this normal?

JuliaWolleb commented 1 year ago

Hi

Yes this is normal and due to the fact that the input images contain various levels of noise.

ZZZGGGG commented 1 year ago

Hi

Yes this is normal and due to the fact that the input images contain various levels of noise.

Thank you for your reply. May I ask you another question? How do I tell if the classifier model has been trained? Thank you, Professor.

JuliaWolleb commented 1 year ago

That is usually hard to tell. You can validate by computing the classification scores on non-noisy images.

ZZZGGGG commented 1 year ago

That is usually hard to tell. You can validate by computing the classification scores on non-noisy images. Ok, thank you very much for your sharing and clarification

tungnthust commented 1 year ago

That is usually hard to tell. You can validate by computing the classification scores on non-noisy images.

Could you kindly share train/test split (patient ids for each set) and hyperparmeters for training classifier for BRATS dataset (i.e learning_rate, batch_size, anneal_lr, weight_decay, dropout) ? I tried your training settings in README, but it does not produce good results (much worse than your pretrained classifier checkpoint you provided). I refered to classifer hyperparmeters used in openai/guided_diffusion repository, it is different from your settings cited in README. So I want to know your exact parameters to reproduce your results. Thank you very much.

JuliaWolleb / diffusion-anomaly

classifier training error #1