HantaoShu / DeepSEM

MIT License
71 stars 23 forks source link

FileNotFoundError but not mentioning the name of file #4

Closed ektapathak08 closed 2 years ago

ektapathak08 commented 3 years ago

I am running the GRN inference step but getting the FileNotFoundError: [Errno 2]. it is not clear that which file is not found. Please help.

I have installed all the required packages using pip

Thanks in advance Ekta

python main.py --task celltype_GRN --data_file counts_normazile_log_transformed.csv --setting new --alpha 100 --beta 1 --n_epoch 90 --save_name out1 Traceback (most recent call last): File "main.py", line 77, in model.train_model() File "/home/ekta/Documents/1_ekta_research_work/covid_Pancreas/Qc_doublet_removal/res_4_5/analysis/anticipated_revision/DeepSEM-master/src/DeepSEM_cell_type_specific_GRN_model.py", line 80, in train_model dataloader, Evaluate_Mask, num_nodes, num_genes, data, truth_edges, TFmask2, gene_name = self.init_data() File "/home/ekta/Documents/1_ekta_research_work/covid_Pancreas/Qc_doublet_removal/res_4_5/analysis/anticipated_revision/DeepSEM-master/src/DeepSEM_cell_type_specific_GRN_model.py", line 32, in init_data Ground_Truth = pd.read_csv(self.opt.net_file, header=0) File "/home/ekta/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 686, in read_csv return _read(filepath_or_buffer, kwds) File "/home/ekta/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 452, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/ekta/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 936, in init self._make_engine(self.engine) File "/home/ekta/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 1168, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/ekta/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 1998, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: ''

HantaoShu commented 3 years ago

It seems you select setting as new instead of test, so you need to provide ground truth label. This setting is used to benchmark performance. So please choice parameter setting as “test”. Code shown in main.py discribe the meaning of each parameter.

ektapathak08 commented 3 years ago

thanks.. the run started but now I am having RuntimeError.

Any suggestions for that?

RuntimeError: CUDA out of memory. Tried to allocate 2.16 GiB (GPU 0; 7.79 GiB total capacity; 4.37 GiB already allocated; 1.13 GiB free; 4.38 GiB reserved in total by PyTorch)

HantaoShu commented 3 years ago

The GPU is out-of-memory. You can decrease batch size, select less gene, or use cpu to train(may be too slow).

ektapathak08 commented 3 years ago

thanks for your prompt response. I changed batch size to 32,16 and 8 but nothing worked. python main.py --task celltype_GRN --data_file counts_normazile_log_transformed.csv --setting test --batch_size 16 --save_name out1

I have 8 GB Qudro RTX4000 GPU, RAM of 94 GB.

Could you please tell me how to use the CPU to train the model? What should be the command or changes in the script to set CPU for model training?

HantaoShu commented 3 years ago

You can delete all ".cuda()" and ".cuda" in file src/DeepSEM_cell_type_test_specific_GRN_model.py and src/Model.py then the model will train by using CPU. for example file DeepSEM_cell_type_test_specific_GRN_model.py line 16,67 Tensor = torch.cuda.FloatTensor->Tensor = torch.FloatTensor line 66 vae = VAE_EAD(adj_A_init, 1, self.opt.n_hidden, self.opt.K).float().cuda()-> vae = VAE_EAD(adj_A_init, 1, self.opt.n_hidden, self.opt.K).float() line 84: loss, loss_rec, loss_gauss, loss_cat, dec, y, hidden = vae(inputs, dropout_mask=dropout_mask.cuda(), -> loss, loss_rec, loss_gauss, loss_cat, dec, y, hidden = vae(inputs, dropout_mask=dropout_mask

file src/Model.py line 8: Tensor = torch.cuda.FloatTensor->torch.FloatTensor line 200:torch.log(torch.FloatTensor([2.0 * np.pi])).sum(0) + torch.log(var) + torch.pow(x - mu, 2) / var, dim=-1)

Sorry for the inconvenience. Note that it might be quite slow when use cpu to train the model.

ektapathak08 commented 3 years ago

Thanks for your response. I'll post again once the job is done so that others can also learn from my issue.

ektapathak08 commented 3 years ago

Hi! As you suggested that using cpu will take lot of time, mu 8 core machine took more than 24 hrs to complete 2 epochs. And there are 120 epochs in default settings. I terminated the run.

will it be good idea to use DEGs only as input? What should be criterion for subsetting gene? In my dataset , I have around 20k cells. Please suggest.

HantaoShu commented 3 years ago

As recommend by BEELINE (https://doi.org/10.1038/s41592-019-0690-6) and also stated in our paper discussion section, I strongly recommend you to select DEG for example(500 or 1000) and TF as input. Note that the memory cost of W in GRN layer is n^2 and the time cost of inverse operation is n^3, so I recommend that the total number of select genes should be less 2000 (may be you can run with gpu in this setting).
And note that the benchmark dataset only contain less than 1.5k cells, so the number of epoch can be decreased in your experiment or the number of training set in each epoch can be decrease. For example, you can keep the n_epoch =120 or 150, but only random sample about 1500 cell in each epoch. Another suggestion is that you can run DeepSEM for multiple times and ensemble the result. In this way you can get stable result.

Any further questions are welcome.

ektapathak08 commented 3 years ago

Thanks for your quick response. I will implement your suggestion and will let you know.

thanks again Ekta