JDACS4C-IMPROVE / DrugCell

A visible neural network model for drug response prediction
MIT License
0 stars 2 forks source link

TypeError: can't convert cuda:0 device type tensor to numpy. #36

Closed wilke closed 11 months ago

wilke commented 11 months ago

Commands:

  1. Build container from definiton file
  2. singularity run --nv --bindpwd/tmp/:/candle_data_dir build/DrugCell.sif train.sh 4 /candle_data_dir --epochs 1

Error:

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Output:

CMD = python /usr/local/DrugCell/train.py --epochs 1
COMMAND: python /usr/local/DrugCell/train.py --epochs 1
/usr/local/DrugCell
/usr/local/DrugCell/train.py
COMMAND: python /usr/local/DrugCell/train.py --epochs 1
using data from /candle_data_dir
/candle_data_dir//Data
Importing candle utils for pytorch
model name:  DrugCell
Configuration file:  /usr/local/DrugCell/DrugCell_params.txt
/opt/conda/lib/python3.7/site-packages/candle/parsing_utils.py:743: RuntimeWarning: These keywords used in the configuration file are not defined in CANDLE: ['cell2id', 'cross_study', 'cuda_id', 'data_model', 'data_predict', 'drug2id', 'drug_hiddens', 'eps', 'final_hiddens', 'fingerprint', 'gene2id', 'genotype', 'genotype_hiddens', 'hidden', 'improve_analysis', 'improve_data_url', 'load', 'metric', 'model_url', 'onto_file', 'original_data', 'output', 'predict_url', 'result', 'train_data_type']
  warnings.warn(message, RuntimeWarning)
/opt/conda/lib/python3.7/site-packages/candle/file_utils.py:217: RuntimeWarning: Path: /candle_data_dir/DrugCell/Output/EXP000/RUN000 already exists... overwriting.
  warnings.warn(message, RuntimeWarning)
Params:
{'batch_size': 1024,
 'cell2id': 'cell2ind.txt',
 'ckpt_checksum': False,
 'ckpt_directory': None,
 'ckpt_keep_limit': 5,
 'ckpt_keep_mode': 'linear',
 'ckpt_restart_mode': 'auto',
 'ckpt_save_best': True,
 'ckpt_save_best_metric': 'val_loss',
 'ckpt_save_interval': 0,
 'ckpt_save_weights_only': False,
 'ckpt_skip_epochs': 0,
 'cross_study': 'yes',
 'cuda_id': 0,
 'data_dir': '/candle_data_dir/DrugCell/Data',
 'data_model': 'model_final.pt',
 'data_predict': 'drugcell_test.txt',
 'data_type': 'CTRPv2,CCLE,GDSCv1,GDSCv2,gCSI',
 'data_url': 'http://drugcell.ucsd.edu/downloads/data.tgz',
 'drug2id': 'drug2ind.txt',
 'drug_hiddens': '100,50,6',
 'epochs': 200,
 'eps': 1e-05,
 'experiment_id': 'EXP000',
 'final_hiddens': 6,
 'fingerprint': 'drug2fingerprint.txt',
 'gene2id': 'gene2ind.txt',
 'genotype': 'cell2mutation.txt',
 'genotype_hiddens': 6,
 'hidden': 'Hidden/',
 'improve_analysis': 'no',
 'improve_data_url': 'https://ftp.mcs.anl.gov/pub/candle/public/improve/benchmarks/single_drug_drp/benchmark-data-pilot1/csa_data/raw_data',
 'learning_rate': 0.0001,
 'load': 'drugcell_v1.pt',
 'logfile': None,
 'loss': 'mse',
 'metric': 'auc',
 'model_name': 'DrugCell',
 'model_url': 'http://drugcell.ucsd.edu/downloads/drugcell_v1.pt',
 'onto_file': 'drugcell_ont.txt',
 'optimizer': 'adam',
 'original_data': 'drugcell_data.tar.gz',
 'output': 'Result/',
 'output_dir': '/candle_data_dir/DrugCell/Output/EXP000/RUN000',
 'predict_url': 'http://drugcell.ucsd.edu/downloads/drugcell_all.txt',
 'profiling': False,
 'result': 'Result/',
 'rng_seed': 7102,
 'run_id': 'RUN000',
 'shuffle': False,
 'test_data': 'drugcell_test.txt',
 'timeout': -1,
 'train_bool': True,
 'train_data': 'drugcell_train.txt',
 'train_data_type': 'CCLE',
 'val_data': 'drugcell_val.txt',
 'verbose': False}
/candle_data_dir
Unpacking file...
download_path: /candle_data_dir/DrugCell/Data/drugcell_data
download_path: /candle_data_dir/DrugCell/Data/drugcell_test.txt
download_path: /candle_data_dir/DrugCell/Data/model_final.pt
using original data placed in /candle_data_dir//Data
using CUDA_VISIBLE_DEVICES 4
using CANDLE_DATA_DIR /candle_data_dir
using CANDLE_CONFIG
running command python /usr/local/DrugCell/train.py --epochs 1
Importing candle utils for pytorch
model name:  DrugCell
Configuration file:  /usr/local/DrugCell/DrugCell_params.txt
/opt/conda/lib/python3.7/site-packages/candle/parsing_utils.py:743: RuntimeWarning: These keywords used in the configuration file are not defined in CANDLE: ['cell2id', 'cross_study', 'cuda_id', 'data_model', 'data_predict', 'drug2id', 'drug_hiddens', 'eps', 'final_hiddens', 'fingerprint', 'gene2id', 'genotype', 'genotype_hiddens', 'hidden', 'improve_analysis', 'improve_data_url', 'load', 'metric', 'model_url', 'onto_file', 'original_data', 'output', 'predict_url', 'result', 'train_data_type']
  warnings.warn(message, RuntimeWarning)
/opt/conda/lib/python3.7/site-packages/candle/file_utils.py:217: RuntimeWarning: Path: /candle_data_dir/DrugCell/Output/EXP000/RUN000 already exists... overwriting.
  warnings.warn(message, RuntimeWarning)
Params:
{'batch_size': 1024,
 'cell2id': 'cell2ind.txt',
 'ckpt_checksum': False,
 'ckpt_directory': None,
 'ckpt_keep_limit': 5,
 'ckpt_keep_mode': 'linear',
 'ckpt_restart_mode': 'auto',
 'ckpt_save_best': True,
 'ckpt_save_best_metric': 'val_loss',
 'ckpt_save_interval': 0,
 'ckpt_save_weights_only': False,
 'ckpt_skip_epochs': 0,
 'cross_study': 'yes',
 'cuda_id': 0,
 'data_dir': '/candle_data_dir/DrugCell/Data',
 'data_model': 'model_final.pt',
 'data_predict': 'drugcell_test.txt',
 'data_type': 'CTRPv2,CCLE,GDSCv1,GDSCv2,gCSI',
 'data_url': 'http://drugcell.ucsd.edu/downloads/data.tgz',
 'drug2id': 'drug2ind.txt',
 'drug_hiddens': '100,50,6',
 'epochs': 1,
 'eps': 1e-05,
 'experiment_id': 'EXP000',
 'final_hiddens': 6,
 'fingerprint': 'drug2fingerprint.txt',
 'gene2id': 'gene2ind.txt',
 'genotype': 'cell2mutation.txt',
 'genotype_hiddens': 6,
 'hidden': 'Hidden/',
 'improve_analysis': 'no',
 'improve_data_url': 'https://ftp.mcs.anl.gov/pub/candle/public/improve/benchmarks/single_drug_drp/benchmark-data-pilot1/csa_data/raw_data',
 'learning_rate': 0.0001,
 'load': 'drugcell_v1.pt',
 'logfile': None,
 'loss': 'mse',
 'metric': 'auc',
 'model_name': 'DrugCell',
 'model_url': 'http://drugcell.ucsd.edu/downloads/drugcell_v1.pt',
 'onto_file': 'drugcell_ont.txt',
 'optimizer': 'adam',
 'original_data': 'drugcell_data.tar.gz',
 'output': 'Result/',
 'output_dir': '/candle_data_dir/DrugCell/Output/EXP000/RUN000',
 'predict_url': 'http://drugcell.ucsd.edu/downloads/drugcell_all.txt',
 'profiling': False,
 'result': 'Result/',
 'rng_seed': 7102,
 'run_id': 'RUN000',
 'shuffle': False,
 'test_data': 'drugcell_test.txt',
 'timeout': -1,
 'train_bool': True,
 'train_data': 'drugcell_train.txt',
 'train_data_type': 'CCLE',
 'val_data': 'drugcell_val.txt',
 'verbose': False}
/candle_data_dir
{'model_name': 'DrugCell', 'data_url': 'http://drugcell.ucsd.edu/downloads/data.tgz', 'improve_data_url': 'https://ftp.mcs.anl.gov/pub/candle/public/improve/benchmarks/single_drug_drp/benchmark-data-pilot1/csa_data/raw_data', 'train_data_type': 'CCLE', 'data_type': 'CTRPv2,CCLE,GDSCv1,GDSCv2,gCSI', 'predict_url': 'http://drugcell.ucsd.edu/downloads/drugcell_all.txt', 'model_url': 'http://drugcell.ucsd.edu/downloads/drugcell_v1.pt', 'original_data': 'drugcell_data.tar.gz', 'data_predict': 'drugcell_test.txt', 'data_model': 'model_final.pt', 'load': 'drugcell_v1.pt', 'train_data': '/candle_data_dir/DrugCell/Data/drugcell_train.txt', 'test_data': '/candle_data_dir/DrugCell/Data/drugcell_test.txt', 'val_data': '/candle_data_dir/DrugCell/Data/drugcell_val.txt', 'onto_file': 'drugcell_ont.txt', 'genotype': '/candle_data_dir/DrugCell/Data/cell2mutation.txt', 'fingerprint': '/candle_data_dir/DrugCell/Data/drug2fingerprint.txt', 'cell2id': '/candle_data_dir/DrugCell/Data/cell2ind.txt', 'drug2id': '/candle_data_dir/DrugCell/Data/drug2ind.txt', 'gene2id': '/candle_data_dir/DrugCell/Data/gene2ind.txt', 'hidden': 'Hidden/', 'output': 'Result/', 'result': '/candle_data_dir/DrugCell/Data/Result/', 'cross_study': 'yes', 'metric': 'auc', 'cuda_id': 0, 'learning_rate': 0.0001, 'batch_size': 1024, 'eps': 1e-05, 'genotype_hiddens': 6, 'drug_hiddens': '100,50,6', 'final_hiddens': 6, 'epochs': 1, 'optimizer': 'adam', 'loss': 'mse', 'improve_analysis': 'no', 'train_bool': True, 'profiling': False, 'experiment_id': 'EXP000', 'run_id': 'RUN000', 'verbose': False, 'logfile': None, 'shuffle': False, 'ckpt_restart_mode': 'auto', 'ckpt_checksum': False, 'ckpt_skip_epochs': 0, 'ckpt_directory': None, 'ckpt_save_best': True, 'ckpt_save_best_metric': 'val_loss', 'ckpt_save_weights_only': False, 'ckpt_save_interval': 0, 'ckpt_keep_mode': 'linear', 'ckpt_keep_limit': 5, 'data_dir': '/candle_data_dir/DrugCell/Data', 'output_dir': '/candle_data_dir/DrugCell/Data/Result/', 'rng_seed': 7102, 'timeout': -1, 'model_params': {'train_data': 'drugcell_train.txt', 'test_data': 'drugcell_test.txt', 'val_data': 'drugcell_val.txt', 'onto_file': 'drugcell_ont.txt', 'genotype_hiddens': 6, 'fingerprint': 'drug2fingerprint.txt', 'genotype': 'cell2mutation.txt', 'cell2id': 'cell2ind.txt', 'drug2id': 'drug2ind.txt', 'drug_hiddens': '100,50,6', 'model_name': 'DrugCell'}, 'onto': '/candle_data_dir/DrugCell/Data/drugcell_ont.txt', 'hidden_path': '/candle_data_dir/DrugCell/Data/Hidden/'}
INFO:DrugCell:Start data preprocessing...
Total number of cell lines = 1225
Total number of drugs = 684
There are 3008 genes
There are 1 roots: GO:0008150
There are 2086 terms
There are 1 connected componenets
INFO:DrugCell:train data has 4
Runtime: 0.4 mins
save_interval: 0
INFO:CandleCkpt:save_interval: 0
Callback initialized.
INFO:CandleCkpt:Callback initialized.
Checkpoint save interval == 0 -> checkpoints are disabled.
INFO:CandleCkpt:Checkpoint save interval == 0 -> checkpoints are disabled.
INFO:DrugCell:== Epoch [0/1] ==
INFO:DrugCell:   **** TRAINING ****   Epoch [1/1], loss: 31.21976. This took 36.1 secs.
INFO:DrugCell:   **** TEST ****   Epoch [1/1], loss: 0.86453. This took 41.8 secs.
/opt/conda/lib/python3.7/site-packages/scipy/stats/stats.py:4484: SpearmanRConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  warnings.warn(SpearmanRConstantInputWarning())
nan
Best performed model (epoch)    0
Traceback (most recent call last):
  File "/usr/local/DrugCell/train.py", line 167, in <module>
    candle_main()
  File "/usr/local/DrugCell/train.py", line 164, in candle_main
    run(params)
  File "/usr/local/DrugCell/train.py", line 155, in run
    scores = main(params)
  File "/usr/local/DrugCell/train_drugcell2.py", line 370, in main
    epoch_train_test_df['test_loss'] = test_loss_list
  File "/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py", line 3612, in __setitem__
    self._set_item(key, value)
  File "/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py", line 3784, in _set_item
    value = self._sanitize_column(value)
  File "/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py", line 4510, in _sanitize_column
    return sanitize_array(value, self.index, copy=True, allow_2d=True)
  File "/opt/conda/lib/python3.7/site-packages/pandas/core/construction.py", line 571, in sanitize_array
    subarr = maybe_convert_platform(data)
  File "/opt/conda/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 118, in maybe_convert_platform
    arr = construct_1d_object_array_from_listlike(values)
  File "/opt/conda/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 1990, in construct_1d_object_array_from_listlike
    result[:] = values
  File "/opt/conda/lib/python3.7/site-packages/torch/_tensor.py", line 759, in __array__
    return self.numpy().astype(dtype, copy=False)
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
rohandavidg commented 11 months ago

strange i cant seem to replicate this on my end

rohandavidg commented 11 months ago

Here is the command i tried singularity exec --nv --bind /homes/ac.rgnanaolivu/improve_data_dir/DrugCell/tmp/:$PWD/data /homes/ac.rgnanaolivu/improve_data_dir/Singularity/images/DrugCell:0.0.1-20230914.sif train.sh $CUDA_VISIBLE_DEVICES $CANDLE_DATA_DIR

RylieWeaver commented 11 months ago

I was able to reproduce the error and fix. It had to do with CUDA GPU / numpy CPU communication of the test loss tensor. Double check when you can Rohan, I made the change on the 'rylie' branch and pushed to develop.

rohandavidg commented 11 months ago

Thanks Rylie

wilke commented 11 months ago

Did run. build and execute image:

Best performed model (epoch) 0 {'test_loss': 0.0008328101038932801, 'test_pcc': -2.8085122494303505e-07, 'test_MSE': 0.8901301622390747, 'test_r2': -19.574140548706055, 'test_scc': nan, 'IMPROVE RESULT': 0.8901301622390747}

IMPROVE_RESULT MSE: 0.8901301622390747