About cuda and data structure

artless-spirit commented 1 year ago

Hello, thank you very much for your kind words. However, I encountered a few issues while trying to replicate the process. I am using PyCharm for the replication, and I have already set the device to CPU as indicated in the readme file. Nevertheless, the program still reports a CUDA error. In addition to that, I converted the TransferDense and TransferRes files into .py files, and I'm getting a FileNotFoundError: [Errno 2] No such file or directory: '/home/freddy/Projects/ModalityMapping/training/preprocessed/' error. Could you please provide me with a standard file organization structure, including the dataset, weight files, and others? Thank you very much.

artless-spirit commented 1 year ago

and can you offer standard naming convention of training and testing file

FJonske commented 1 year ago

I assume you are getting a CUDA error because torch.nn.Parallel is still in your code? Even if you set your device as "cpu", when torch.nn.Parallel(model) is called, it will try to push data to a GPU that isn't there. That's what it sounds like, at any rate.

In DatasetFunctions.py lives a class named LAZY_Dataset (so called because it loads data lazily). That class expects a data_root argument it's first initialized. The way my data was structured was that I had a "preprocessed" and a "metadata" folder in data_root.

Files in the "preprocessed" folder were called MM_00000000.pth, MM_00000001.pth, etc. - pth files are tensors of my preprocessed images. Files in the "metadata" folder were called MM_meta_00000000.dcm, MM_meta_00000001.dcm, etc. - these were DICOM files corresponding to the original image, so I could look at their metadata if I wanted to.

You can use this structure, but you can also just rewrite or replace the dataset class with your own and do whatever you like.

artless-spirit commented 1 year ago

Thank you for your replying,I will try your instruction

FJonske commented 1 year ago

If you post the error messages you get, I can probably be more helpful with debugging.

artless-spirit commented 1 year ago

I have met two question :First,I have modified the device type to 'cpu' which written in the TransferDense.ipynb,and the prgram raise the problem as follows:error: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu,so I was confused about how to make the this work work on cpu device. Second,I tried two approaches to proform this work,one is following the readme,using the command line and another is filling the parameters in the code and using the pycharm to work ,and the first approach will raise the problem as above mentioned and the second approach works well ,I was confused. Thank you for your replying.

artless-spirit commented 1 year ago

I have met other problem,whatever I set the nc (num_classes),it always reminds me that 'IndexError: Target 76 is out of bounds.',Can you help me to address it? Thank you!

FJonske commented 1 year ago

As for the first problem: The problem seems to be that not every part of the code is aware that you're trying to work on the CPU, as some of them apparently expect to find something on cuda:0 (the GPU), while whatever this something is, it's currently on the CPU. Since you've already solved the problem though, I'll ignore that for now.

Concering the "IndexError: Target 76 is out of bounds" error - can you post a full stack trace? Not 100% sure what is happening there. Presumably I accidentally wrote nc+1 instead of nc somewhere.

artless-spirit commented 1 year ago

I solved second problem by doing ' target=np.ones_like(target) target = torch.tensor(target)',this operation set a certain number,though it can solve the problem I met ,but it looks illegally.The problem is that I can modify the target that read from dataset. Here is the full description of problem. Traceback (most recent call last): File "/home/anjindu/MOMO_Submission_Code/TransferDense.py", line 549, in predictions, probabilities, targets, accuracy = test_model(model=DN_Tmodel, File "/home/anjindu/MOMO_Submission_Code/TransferDense.py", line 342, in test_model test_loss += nnf.nll_loss(lsm, target.to(output.device), reduction='sum').item() # sum up batch loss File "/home/anjindu/miniconda3/envs/MEDDL/lib/python3.8/site-packages/torch/nn/functional.py", line 2264, in nll_loss ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) IndexError: Target 76 is out of bounds. Thank you for your replying.The parameter of nc is not change.

artless-spirit commented 1 year ago

This problem occured when target is needed.

FJonske commented 1 year ago

I think I know what the error is. This happens when you execute the ffcv function at the very bottom, right? It says nc=76, but it should really be a different number - however long the mapping json that you currently use is. In the version I uploaded I made a stupid mistake and it currently says 76 (that should be the number of classes if I throw the entire training dataset at it) even though it should be 13, which is the number of MR image classes. The MR mapping json is even selected. I can't try it out right now, but that is probably where the error comes from.

If that fixes it, I will update the code in the repo when I come back from my vacation.

TIO-IKIM / MOMO_Submission_Code

About cuda and data structure #3