OSU-BMBL / scDEAL

Deep Transfer Learning of Drug Sensitivity by Integrating Bulk and Single-cell RNA-seq data
Apache License 2.0
46 stars 11 forks source link

The correct way to select cells? #4

Open WhirlFirst opened 1 year ago

WhirlFirst commented 1 year ago

I am confused is this the right way to get the selected index? In your code, you define the selected_idx as: https://github.com/OSU-BMBL/scDEAL/blob/a2cf46215494bfa113e30d3633c4e663953718ac/bulkmodel.py#L109

and you use it as: https://github.com/OSU-BMBL/scDEAL/blob/a2cf46215494bfa113e30d3633c4e663953718ac/bulkmodel.py#L118

but actually selected_idx (the output of line 109) still contains all the index, like this:

image

so it means that in line 118 the selected_idx.index still have all the ids and this code failed to filters the data with na. And I suspect this also cause the error mentioned in other issues like https://github.com/OSU-BMBL/scDEAL/issues/3. Could you please recheck your code and results? Thanks.

juychen commented 1 year ago

Hi, I tried to fix this issue. Could you please test whether it is okay now?

WhirlFirst commented 1 year ago

Sorry, I don't know why you delete this code? I think the NA data still can not be filtered. https://github.com/OSU-BMBL/scDEAL/blob/d8d492d4475128e0f7cade82c17e07994a53cc25/bulkmodel.py#L86

juychen commented 1 year ago

Sorry, I don't know why you delete this code? I think the NA data still can not be filtered.

https://github.com/OSU-BMBL/scDEAL/blob/d8d492d4475128e0f7cade82c17e07994a53cc25/bulkmodel.py#L86

https://github.com/OSU-BMBL/scDEAL/blob/a2cf46215494bfa113e30d3633c4e663953718ac/bulkmodel.py#L109

I hope to filter out na values in L109

WhirlFirst commented 1 year ago

I have shown above that you cannot filter out na values in L109, because the selected_idx still contains all the indexes. image

Please carefully recheck your code.

WhirlFirst commented 1 year ago

selected_idx is a vector indicating which index contains NA value, but it has the index of all the data (length=1280). so it is impossible to filter out NA values by using its index. it would help if you changed it into selected_idx = label_r[label_r.loc[:,select_drug]!=na] Please tell me if you need me to clarify my meaning in Chinese.

juychen commented 1 year ago

selected_idx is a vector indicating which index contains NA value, but it has the index of all the data (length=1280). so it is impossible to filter out NA values by using its index. it would help if you changed it into selected_idx = label_r[label_r.loc[:,select_drug]!=na] Please tell me if you need me to clarify my meaning in Chinese.

I see, tried to fix it in the previous commit

WhirlFirst commented 1 year ago

So why did you commit this line? https://github.com/OSU-BMBL/scDEAL/blob/978ed04b37e3fddd087e920474fad5f31ab3d133/bulkmodel.py#L86 I think this should be worked with https://github.com/OSU-BMBL/scDEAL/blob/978ed04b37e3fddd087e920474fad5f31ab3d133/bulkmodel.py#L109

请问这个代码真的是你们做实验用的代码吗?为什么有这些bug的情况下还能得到结果呢?

WhirlFirst commented 1 year ago

@juychen @Wang-Cankun @PegasusAM @OSU-BMBL-admin Could you please give some explanation about these bugs in your project code?

juychen commented 1 year ago

Some issues may happen when merging from the development branch. We are checking the issue at the moment. We will fix it soon

WhirlFirst commented 1 year ago

Hi @juychen, Do you release the final version of your code? I have a question why did you commit this line? https://github.com/OSU-BMBL/scDEAL/blob/8adedff81b1961d45e525cfa6f1725efbf5fb586/bulkmodel.py#L86

Has the bug in the code been resolved? Thanks for the adata results you provided. Can you provide a bash file to get your adata file from the current code?

请问代码的Bug解决了吗?感谢你们提供你们跑出来的结果。但能否提供一个从现在代码得到你们adata文件的运行文件?这是复现结果很重要的一步。

juychen commented 1 year ago

Hi @juychen, Do you release the final version of your code? I have a question why did you commit this line?

https://github.com/OSU-BMBL/scDEAL/blob/8adedff81b1961d45e525cfa6f1725efbf5fb586/bulkmodel.py#L86

Has the bug in the code been resolved? Thanks for the adata results you provided. Can you provide a bash file to get your adata file from the current code?

请问代码的Bug解决了吗?感谢你们提供你们跑出来的结果。但能否提供一个从现在代码得到你们adata文件的运行文件?这是复现结果很重要的一步。 Hi, have you try to clone the repository, download the data in a new directory and to re-run the code? Will there still be any bugs?

WhirlFirst commented 1 year ago

Yes. I clone the repository. I still have the bug. @juychen

Traceback (most recent call last): File "bulkmodel.py", line 369, in <module> run_main(args) File "bulkmodel.py", line 264, in run_main optimizer,loss_function,epochs,exp_lr_scheduler,load=load_model,save_path=preditor_path) File "/nfs_beijing/minsheng/scbig/bioinfoDownStream/scDEAL1222/trainers.py", line 400, in train_predictor_model loss = loss_function(output, y) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 1165, in forward label_smoothing=self.label_smoothing) File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 2996, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing) IndexError: Target 2 is out of bounds. I think it is caused by committing the this line label_r=label_r.fillna(na) So that this line can not work. https://github.com/OSU-BMBL/scDEAL/blob/8adedff81b1961d45e525cfa6f1725efbf5fb586/bulkmodel.py#L109

juychen commented 1 year ago

Yes. I clone the repository. I still have the bug. @juychen

Traceback (most recent call last): File "bulkmodel.py", line 369, in <module> run_main(args) File "bulkmodel.py", line 264, in run_main optimizer,loss_function,epochs,exp_lr_scheduler,load=load_model,save_path=preditor_path) File "/nfs_beijing/minsheng/scbig/bioinfoDownStream/scDEAL1222/trainers.py", line 400, in train_predictor_model loss = loss_function(output, y) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 1165, in forward label_smoothing=self.label_smoothing) File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 2996, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing) IndexError: Target 2 is out of bounds. I think it is caused by committing the this line label_r=label_r.fillna(na) So that this line can not work.

https://github.com/OSU-BMBL/scDEAL/blob/8adedff81b1961d45e525cfa6f1725efbf5fb586/bulkmodel.py#L109

Hi, I have fixed the corresponding bug by adding the fillna code. Also, I have edited the code to enhance usability, including the selection of CPU devices and making result folders if not exists. Updates are now pushed to the main branch and it should work correctly now. I have tested starting the configuration from scratch by the following procedure: create a new folder, clone code from GitHub, download the data, and run code following the command line. You can clean up the previous version and start from scratch to see whether it works.

WhirlFirst commented 1 year ago

Thanks! @juychen. So could you reproduce the same h5ad results(https://portland-my.sharepoint.com/:u:/g/personal/junyichen8-c_my_cityu_edu_hk/EYru-LaQC1tHlFZSnf1RA_cBjXwIafy-iDsajEWjh8xcjA?e=2sE61e) by using the new code?

SZ-qing commented 1 year ago

I reproduced three datasets based on the parameters of the author's adata file, and found that the final results were very poor, and I felt that it was necessary for the author to double-check the code and data

LCGaoZzz commented 5 months ago

I reproduced three datasets based on the parameters of the author's adata file, and found that the final results were very poor, and I felt that it was necessary for the author to double-check the code and data

Yes, I've noticed that the results were indeed poor and far from what's described in the article. This certainly necessitates a thorough review of the code and data by the authors to ensure accuracy.