NVIDIA-Genomics-Research / AtacWorks

Deep learning based processing of Atac-seq data
https://clara-parabricks.github.io/AtacWorks/
Other
128 stars 23 forks source link

main.py infer: error: argument --gpu: invalid int value: 'None' #153

Closed feefee20 closed 4 years ago

feefee20 commented 4 years ago

Hello,

I am sorry but I don't know where/whom I can ask, so just leave some questions again here (It would be great for me to know the contact info like email address). I got some issues while I was trying both tutorial 1 and 2 as below:

1) Here's the log when I ran the following command at step7 of tutorial 1. It just finished without any error or warning. I don't still get final files that you expect after it's done. Is is still because of inappropriate memory or gpu setting (please see the following setting)?

export LSF_DOCKER_NETWORK=host
export LSF_DOCKER_IPC=host
export LSF_DOCKER_SHM_SIZE=3g
bsub -G compute-yooa -Is -q general-interactive -gpu "num=4:gmodel=TeslaV100_SXM2_32GB" -R 'rusage[mem=64GB]' -M 64GB -a 'docker(claraomics/atacworks)' /bin/bash
$atacworks/scripts/main.py train --config train_config.yaml --config_mparams model_structure.yaml --files_train Mono.50.2400.train.h5 --val_files Mono.50.2400.val.h5

INFO:2020-05-13 16:37:02,912:AtacWorks-main] Running on GPU: 0
Building model: resnet ...
Finished building.
Saving config file to ./trained_models_2020.05.13_16.37/configs/model_structure.yaml...
Num_batches 500; rank 0, gpu 0
Epoch [ 0/25] -------------------- [  0/500] mse:  20.142 | pearsonloss:   0.986 | total_loss:   1.603 | bce:   0.607
Epoch [ 0/25] ##------------------ [ 50/500] mse:2030.479 | pearsonloss:   0.055 | total_loss:   1.578 | bce:   0.507
Epoch [ 0/25] ####---------------- [100/500] mse:  20.136 | pearsonloss:   0.986 | total_loss:   1.079 | bce:   0.083
Epoch [ 0/25] ######-------------- [150/500] mse:1936.060 | pearsonloss:   0.015 | total_loss:   1.054 | bce:   0.071
Epoch [ 0/25] ########------------ [200/500] mse:  20.369 | pearsonloss:   0.986 | total_loss:   1.109 | bce:   0.112
Epoch [ 0/25] ##########---------- [250/500] mse:  58.329 | pearsonloss:   0.012 | total_loss:   0.100 | bce:   0.058
Epoch [ 0/25] ############-------- [300/500] mse:  20.124 | pearsonloss:   0.983 | total_loss:   1.079 | bce:   0.086
Epoch [ 0/25] ##############------ [350/500] mse:  42.750 | pearsonloss:   0.006 | total_loss:   0.080 | bce:   0.052
Epoch [ 0/25] ################---- [400/500] mse:  20.125 | pearsonloss:   0.983 | total_loss:   1.079 | bce:   0.086
Epoch [ 0/25] ##################-- [450/500] mse: 162.624 | pearsonloss:   0.010 | total_loss:   0.151 | bce:   0.060
Epoch [ 0/25] #################### [499/500] mse:   1.762 | pearsonloss:   0.955 | total_loss:   0.990 | bce:   0.034
Epoch [ 0/25] Time Taken: 956.095s
Total train time: 956.095   For time: 931.616   Back time: 3.983    Print time: 17.534  Remain (data) time: 2.962
Eval for 20 batches
Inference -------------------- [ 0/20] 
Evaluating on 50000 points.
Evaluation result: mse:27.3667 | corrcoef: 0.1628 | bce: 0.1126 | recall: 0.2045 | specificity: 0.9811 | auroc: 0.9592
Evaluation time taken:  26.184s
New best metric found - auroc: 0.9592
Saving model ckpt to ./trained_models_2020.05.13_16.37/epoch0_None...
Saving best model to ./trained_models_2020.05.13_16.37/model_best.pth.tar...
Num_batches 500; rank 0, gpu 0
Epoch [ 1/25] -------------------- [  0/500] mse: 362.712 | pearsonloss:   0.511 | total_loss:   0.864 | bce:   0.172
Epoch [ 1/25] ##------------------ [ 50/500] mse: 127.088 | pearsonloss:   0.572 | total_loss:   0.733 | bce:   0.098
Epoch [ 1/25] ####---------------- [100/500] mse: 212.383 | pearsonloss:   0.205 | total_loss:   0.457 | bce:   0.146
Epoch [ 1/25] ######-------------- [150/500] mse: 111.377 | pearsonloss:   0.711 | total_loss:   0.848 | bce:   0.081
Epoch [ 1/25] ########------------ [200/500] mse: 221.441 | pearsonloss:   0.203 | total_loss:   0.449 | bce:   0.135
Epoch [ 1/25] ##########---------- [250/500] mse: 105.753 | pearsonloss:   0.706 | total_loss:   0.835 | bce:   0.076
Epoch [ 1/25] ############-------- [300/500] mse: 215.121 | pearsonloss:   0.200 | total_loss:   0.423 | bce:   0.116
Epoch [ 1/25] ##############------ [350/500] mse:  98.106 | pearsonloss:   0.682 | total_loss:   0.809 | bce:   0.078
Epoch [ 1/25] ################---- [400/500] mse: 156.059 | pearsonloss:   0.205 | total_loss:   0.390 | bce:   0.106
Epoch [ 1/25] ##################-- [450/500] mse:  93.414 | pearsonloss:   0.682 | total_loss:   0.800 | bce:   0.071
Epoch [ 1/25] #################### [499/500] mse:  38.004 | pearsonloss:   0.494 | total_loss:   0.542 | bce:   0.030
Epoch [ 1/25] Time Taken: 954.789s
Total train time: 954.789   For time: 931.293   Back time: 3.849    Print time: 17.520  Remain (data) time: 2.126
Eval for 20 batches
Inference -------------------- [ 0/20] 
Evaluating on 50000 points.
Evaluation result: mse:76.8793 | corrcoef: 0.5468 | bce: 0.0542 | recall: 0.4850 | specificity: 0.9834 | auroc: 0.9712
Evaluation time taken:  25.658s
New best metric found - auroc: 0.9712
Saving model ckpt to ./trained_models_2020.05.13_16.37/epoch1_None...
Saving best model to ./trained_models_2020.05.13_16.37/model_best.pth.tar...
.   .   . 
Evaluation time taken:  28.959s
Saving model ckpt to ./trained_models_2020.05.13_16.37/epoch23_None...
Num_batches 500; rank 0, gpu 0
Epoch [24/25] -------------------- [  0/500] mse:  18.866 | pearsonloss:   0.999 | total_loss:   1.137 | bce:   0.129
Epoch [24/25] ##------------------ [ 50/500] mse:  58.982 | pearsonloss:   0.216 | total_loss:   0.303 | bce:   0.057
Epoch [24/25] ####---------------- [100/500] mse:  20.399 | pearsonloss:   0.993 | total_loss:   1.102 | bce:   0.099
Epoch [24/25] ######-------------- [150/500] mse:  36.769 | pearsonloss:   0.171 | total_loss:   0.229 | bce:   0.040
Epoch [24/25] ########------------ [200/500] mse:  19.917 | pearsonloss:   0.992 | total_loss:   1.100 | bce:   0.098
Epoch [24/25] ##########---------- [250/500] mse:  32.314 | pearsonloss:   0.167 | total_loss:   0.223 | bce:   0.040
Epoch [24/25] ############-------- [300/500] mse:  19.569 | pearsonloss:   0.991 | total_loss:   1.098 | bce:   0.098
Epoch [24/25] ##############------ [350/500] mse:  31.522 | pearsonloss:   0.163 | total_loss:   0.217 | bce:   0.039
Epoch [24/25] ################---- [400/500] mse:  19.408 | pearsonloss:   0.989 | total_loss:   1.096 | bce:   0.098
Epoch [24/25] ##################-- [450/500] mse:  30.666 | pearsonloss:   0.161 | total_loss:   0.215 | bce:   0.038
Epoch [24/25] #################### [499/500] mse:   8.531 | pearsonloss:   0.861 | total_loss:   0.906 | bce:   0.040
Epoch [24/25] Time Taken: 957.365s
Total train time: 957.365   For time: 932.471   Back time: 4.490    Print time: 17.516  Remain (data) time: 2.888
Eval for 20 batches
Inference -------------------- [ 0/20] 
Evaluating on 50000 points.
Evaluation result: mse:11.2150 | corrcoef: 0.5584 | bce: 0.1608 | recall: 0.2936 | specificity: 0.9997 | auroc: 0.4045
Evaluation time taken:  25.740s
Saving model ckpt to ./trained_models_2020.05.13_16.37/epoch24_None...

2) This is another error when I tried step 7 of tutorial 2. I got the same errors regardless of using --file_sizes or --sizes_file. Would you please let me know what's wrong or how to fix it? Thank you so much for your help.

[$atacworks/scripts/main.py infer     --files NK.50_cells.h5     --file_sizes $atacworks/data/reference/hg19.auto.sizes     --config configs/infer_config.yaml     --config_mparams configs/model_structure.yaml 
usage: main.py infer [-h] --label LABEL --out_home OUT_HOME --task
                     {regression,classification,both} --print_freq PRINT_FREQ
                     --bs BS --num_workers NUM_WORKERS --pad PAD --transform
                     {log,None} --layers LAYERS --weights_path WEIGHTS_PATH
                     --gpu GPU [--distributed] --dist-url DIST_URL
                     --dist-backend DIST_BACKEND [--debug] [--config CONFIG]
                     --files FILES --intervals_file INTERVALS_FILE
                     --sizes_file SIZES_FILE --infer_threshold INFER_THRESHOLD
                     --reg_rounding REG_ROUNDING --cla_rounding CLA_ROUNDING
                     --batches_per_worker BATCHES_PER_WORKER [--gen_bigwig]
                     --result_fname RESULT_FNAME [--deletebg]
main.py infer: error: argument --gpu: invalid int value: 'None'](url)
avantikalal commented 4 years ago

Hi @wookyung , thank you for trying AtacWorks! Happy to help you get the tutorials running.

For tutorial 2, I think you need to follow this line in the tutorial: Note: infer_config.yaml is set up to use multiple GPUs. If you are using a single GPU, edit infer_config.yaml to change the line gpu: "None" to read gpu: 0.

--sizes_file is the correct flag.

avantikalal commented 4 years ago

For the issue with tutorial 1, I'm not quite clear on the problem you're facing. The log says that it has saved output files to the folder ./trained_models_2020.05.13_16.37. Are you saying that no output files exist in that folder?

feefee20 commented 4 years ago

Hi avantikalal,

Thank you for your answer! Here's the output files in trained_models_2020.05.13_16.37.

Sorry but I think I was confused with tutorial 2. Are those outputs the common results including

model_best.pth.tar?

kimw@compute1-exec-206:/data/kimw/ATAC/ATAC_2019$ ls -l trained_models_2020.05.13_16.37

total 26640

d---------. 2 kimw domain users 8192 May 13 16:37 configs

----------. 1 kimw domain users 892762 May 13 16:53 epoch0_None

----------. 1 kimw domain users 892762 May 13 19:37 epoch10_None

----------. 1 kimw domain users 892762 May 13 19:53 epoch11_None

----------. 1 kimw domain users 892762 May 13 20:09 epoch12_None

----------. 1 kimw domain users 892762 May 13 20:26 epoch13_None

----------. 1 kimw domain users 892762 May 13 20:42 epoch14_None

----------. 1 kimw domain users 892762 May 13 20:58 epoch15_None

----------. 1 kimw domain users 892762 May 13 21:15 epoch16_None

----------. 1 kimw domain users 892762 May 13 21:31 epoch17_None

----------. 1 kimw domain users 892762 May 13 21:47 epoch18_None

----------. 1 kimw domain users 892762 May 13 22:04 epoch19_None

----------. 1 kimw domain users 892762 May 13 17:09 epoch1_None

----------. 1 kimw domain users 892762 May 13 22:20 epoch20_None

----------. 1 kimw domain users 892762 May 13 22:37 epoch21_None

----------. 1 kimw domain users 892762 May 13 22:53 epoch22_None

----------. 1 kimw domain users 892762 May 13 23:10 epoch23_None

----------. 1 kimw domain users 892762 May 13 23:26 epoch24_None

----------. 1 kimw domain users 892762 May 13 17:26 epoch2_None

----------. 1 kimw domain users 892762 May 13 17:42 epoch3_None

----------. 1 kimw domain users 892762 May 13 17:58 epoch4_None

----------. 1 kimw domain users 892762 May 13 18:15 epoch5_None

----------. 1 kimw domain users 892762 May 13 18:31 epoch6_None

----------. 1 kimw domain users 892762 May 13 18:47 epoch7_None

----------. 1 kimw domain users 892762 May 13 19:04 epoch8_None

----------. 1 kimw domain users 892762 May 13 19:20 epoch9_None

----------. 1 kimw domain users 892762 May 13 20:26 model_best.pth.tar

2020년 5월 13일 (수) 오후 11:12, avantikalal notifications@github.com님이 작성:

For the issue with tutorial 1, I'm not quite clear on the problem you're facing. The log says that it has saved output files to the folder ./trained_models_2020.05.13_16.37. Are you saying that no output files exist in that folder?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-628375164, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTOUQIQZILFYLJGTZ23RRNVUBANCNFSM4NAJHTYQ .

feefee20 commented 4 years ago

For tutorial 2, I am trying it with gpu: 0. Thanks for your comments.

2020년 5월 13일 (수) 오후 11:08, avantikalal notifications@github.com님이 작성:

Hi @wookyung https://github.com/wookyung , thank you for trying AtacWorks! Happy to help you get the tutorials running.

For tutorial 2, I think you need to follow this line in the tutorial: Note: infer_config.yaml is set up to use multiple GPUs. If you are using a single GPU, edit infer_config.yaml to change the line gpu: "None" to read gpu: 0.

--sizes_file is the correct flag.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-628374205, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTJ6YFL7BI3XRTVP2PDRRNVE3ANCNFSM4NAJHTYQ .

avantikalal commented 4 years ago

Regarding tutorial 1, these are the expected output files, but it seems they are not being saved with the expected file/folder names. We will correct this.

model_best.pth.tar is the model with the best AUROC on the validation set.

feefee20 commented 4 years ago

Thanks for your help!

2020년 5월 14일 (목) 오전 1:10, avantikalal notifications@github.com님이 작성:

Regarding tutorial 1, these are the expected output files, but it seems they are not being saved with the expected file/folder names. We will correct this.

model_best.pth.tar is the model with the best AUROC on the validation set.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-628409411, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTO2TQTXSSMU4VNHTSDRRODNPANCNFSM4NAJHTYQ .

feefee20 commented 4 years ago

At step 7 of tutorial 2, it's just done and I got the following error. Could you look it over? Thanks.

-Wookyung

$atacworks/scripts/main.py infer --files NK.50_cells.h5 --sizes_file $atacworks/data/reference/hg19.auto.sizes --config configs/infer_config_single.yaml --config_mparams configs/model_structure.yaml

INFO:2020-05-14 05:54:10,419:AtacWorks-main] Checkng input files for compatibility

Building model: resnet ...

Loading model weights from ./models/model.pth.tar...

Finished loading.

INFO:2020-05-14 05:54:16,883:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-05-14 05:54:16,900:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-05-14 05:54:16,946:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-05-14 05:54:16,958:AtacWorks-model_utils] Compiling model in DistributedDataParallel

Finished building.

Inference -------------------- [ 0/29]

Inference time taken: 120.019s (Load 3.207s,Prediction 116.565s)

INFO:2020-05-14 05:56:19,118:AtacWorks-main] Waiting for writer to finish...

Process Process-2:

Traceback (most recent call last):

File "/usr/lib/python3.6/shutil.py", line 550, in move

os.rename(src, real_dst)

OSError: [Errno 18] Invalid cross-device link: '/tmp/246915.tmpdir/tmp41kg1xlw/0/00001' -> './inference_output_2020.05.14_05.54/NK_inferred.track.bedGraph'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap

self.run()

File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run

self._target(*self._args, **self._kwargs)

File "/AtacWorks/scripts/main.py", line 275, in writer

shutil.move(files[0], outfiles[channel])

File "/usr/lib/python3.6/shutil.py", line 564, in move

copy_function(src, real_dst)

File "/usr/lib/python3.6/shutil.py", line 264, in copy2

copystat(src, dst, follow_symlinks=follow_symlinks)

File "/usr/lib/python3.6/shutil.py", line 229, in copystat

_copyxattr(src, dst, follow_symlinks=follow)

File "/usr/lib/python3.6/shutil.py", line 165, in _copyxattr

os.setxattr(dst, name, value, follow_symlinks=follow_symlinks)

PermissionError: [Errno 13] Permission denied: './inference_output_2020.05.14_05.54/NK_inferred.track.bedGraph'

Saving config file to ./infer_config.yaml..

2020년 5월 14일 (목) 오전 12:52, Wookyung Kim kimwk2011@gmail.com님이 작성:

For tutorial 2, I am trying it with gpu: 0. Thanks for your comments.

2020년 5월 13일 (수) 오후 11:08, avantikalal notifications@github.com님이 작성:

Hi @wookyung https://github.com/wookyung , thank you for trying AtacWorks! Happy to help you get the tutorials running.

For tutorial 2, I think you need to follow this line in the tutorial: Note: infer_config.yaml is set up to use multiple GPUs. If you are using a single GPU, edit infer_config.yaml to change the line gpu: "None" to read gpu: 0.

--sizes_file is the correct flag.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-628374205, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTJ6YFL7BI3XRTVP2PDRRNVE3ANCNFSM4NAJHTYQ .

ntadimeti commented 4 years ago

@wookyung Looks like there's some permission issues and the program is unable to write to the following destination "./inference_output_2020.05.14_05.54". Can you do the following and show us the output?

ls -l ./inference_output_2020.05.14_05.54

By the way, thank you very much for going through this and pointing out the issues. We will make sure to add tests to prevent this from happening in the future. We appreciate your feedback a lot.

feefee20 commented 4 years ago

Hi @ntadimeti and @avantikalal,

Thanks again for your help. Here's the results with $atacworks/scripts/main.py infer --files NK.50_cells.h5 --sizes_file $atacworks/data/reference/hg19.auto.sizes --config configs/infer_config_single.yaml --config_mparams configs/model_structure.yaml.

Actually, I got the file named inference_output_2020.05.14_05.54, in which just only 'NK_inferred.track.bedGraph' is. Would it be helpful to figure out what's happening? Thanks.

Best, Wookyung

[WOOKYUNG@MacBook-Pro-4 ATAC_2019]$ ls -lt tutorial2

total 352384

drwx------ 1 WOOKYUNG staff 16384 May 14 01:19 inference_output_2020.05.14_05.54

-rwx------ 1 WOOKYUNG staff 667 May 14 01:18 infer_config.yaml

drwx------ 1 WOOKYUNG staff 16384 May 13 20:44 configs

-rwx------ 1 WOOKYUNG staff 169312687 May 11 22:53 NK.50_cells.h5

drwx------ 1 WOOKYUNG staff 16384 May 11 22:46 intervals

drwx------ 1 WOOKYUNG staff 16384 May 11 22:41 models

-rwx------ 1 WOOKYUNG staff 8906319 Feb 18 00:08 dsc.1.NK.50.cutsites.smoothed.200.bw

[WOOKYUNG@MacBook-Pro-4 ATAC_2019]$ ls -lt tutorial2/inference_output_2020.05.14_05.54

total 2156544

-rwx------ 1 WOOKYUNG staff 1103228048 May 14 01:18 NK_inferred.track.bedGraph

[WOOKYUNG@MacBook-Pro-4 ATAC_2019]$ head tutorial2/inference_output_2020.05.14_05.54/NK_inferred.track.bedGraph

chr1 7234 7828 1.0

chr1 7834 7836 1.0

chr1 7842 7843 1.0

chr1 7850 7930 1.0

chr1 7930 7932 2.0

chr1 7932 8008 1.0

chr1 8009 8013 1.0

chr1 8052 8053 1.0

chr1 8060 8064 1.0

chr1 8065 8066 1.0

[WOOKYUNG@MacBook-Pro-4 ATAC_2019]$ cat infer_config.yaml

cat: infer_config.yaml: No such file or directory

[WOOKYUNG@MacBook-Pro-4 ATAC_2019]$ cat tutorial2/infer_config.yaml

batch_size: 1

batches_per_worker: 16

bs: 512

cla_rounding: 3

config: configs/infer_config_single.yaml

debug: false

deletebg: false

dist_backend: gloo

dist_url: tcp://127.0.0.1:4321

distributed: true

exp_dir: ./inference_output_2020.05.14_05.54

files:

gen_bigwig: true

gpu: 0

infer_threshold: 0.5

interval_size: 60000

intervals_file: ./intervals/hg19.50000.genome_intervals.bed

label: inference_output

layers: null

mode: infer

num_workers: 4

out_home: ./

pad: 5000

print_freq: 50

reg_rounding: 0

result_fname: inferred

sizes_file: /AtacWorks/data/reference/hg19.auto.sizes

task: both

transform: None

weights_path: ./models/model.pth.tar

world_size: 4

2020년 5월 14일 (목) 오후 12:15, ntadimeti notifications@github.com님이 작성:

@wookyung https://github.com/wookyung Looks like there's some permission issues and the program is unable to write to the following destination "./inference_output_2020.05.14_05.54". Can you do the following and show us the output?

ls -l ./inference_output_2020.05.14_05.54

By the way, thank you very much for going through this and pointing out the issues. We will make sure to add tests to prevent this from happening in the future. We appreciate your feedback a lot.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-628773304, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTIQVL6H4LMCR6Z6BQDRRQRM3ANCNFSM4NAJHTYQ .

ntadimeti commented 4 years ago

Hi @wookyung the permissions to the inference_output_2020.05.14_05.54 folder has only admin rights. By default os.makedirs should create a folder with 777 permissions, but on some systems the mode is ignored. Here's a detailed explanation : https://docs.python.org/3/library/os.html

For now, you could add a os.chmod call explicitly after the os.makedirs() is called, that should resolve the issue you are facing : https://docs.python.org/3/library/os.html#os.chmod

I will look into it and work on a more permanent fix. Also, will figure out if there's a more straight forward way for you to fix this.

Thanks for your patience.

feefee20 commented 4 years ago

Thanks for figuring it out.

1) Anyway, I just tried to let the upper directory where the output folder is generated have 777 permission with 'chmod 777 tutorial2' but still I had the same issue, that I had just a bedgraph file as below:

kimw@compute1-exec-210:/data/kimw/ATAC/ATAC_2019/tutorial2$ $atacworks/scripts/main.py infer --files NK.50_cells.h5 --sizes_file $atacworks/data/reference/hg19.auto.sizes --config configs/infer_config_single.yaml --config_mparams configs/model_structure.yaml

INFO:2020-05-15 02:51:39,307:AtacWorks-main] Checkng input files for compatibility

Building model: resnet ...

Loading model weights from ./models/model.pth.tar...

INFO:2020-05-15 02:51:47,005:AtacWorks-model_utils] Compiling model in DistributedDataParallel

Finished loading.

INFO:2020-05-15 02:51:47,926:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-05-15 02:51:47,981:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-05-15 02:51:48,033:AtacWorks-model_utils] Compiling model in DistributedDataParallel

Finished building.

Inference -------------------- [ 0/29]

Inference time taken: 120.692s (Load 3.292s,Prediction 117.140s)

INFO:2020-05-15 02:53:50,337:AtacWorks-main] Waiting for writer to finish...

Process Process-2:

Traceback (most recent call last):

File "/usr/lib/python3.6/shutil.py", line 550, in move

os.rename(src, real_dst)

OSError: [Errno 18] Invalid cross-device link: '/tmp/247693.tmpdir/tmptkst0za1/0/00001' -> './inference_output_2020.05.15_02.51/NK_inferred.track.bedGraph'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap

self.run()

File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run

self._target(*self._args, **self._kwargs)

File "/AtacWorks/scripts/main.py", line 275, in writer

shutil.move(files[0], outfiles[channel])

File "/usr/lib/python3.6/shutil.py", line 564, in move

copy_function(src, real_dst)

File "/usr/lib/python3.6/shutil.py", line 264, in copy2

copystat(src, dst, follow_symlinks=follow_symlinks)

File "/usr/lib/python3.6/shutil.py", line 229, in copystat

_copyxattr(src, dst, follow_symlinks=follow)

File "/usr/lib/python3.6/shutil.py", line 165, in _copyxattr

os.setxattr(dst, name, value, follow_symlinks=follow_symlinks)

PermissionError: [Errno 13] Permission denied: './inference_output_2020.05.15_02.51/NK_inferred.track.bedGraph'

kimw@compute1-exec-210:/data/kimw/ATAC/ATAC_2019/tutorial2$ ls -lt

total 165456

d---------. 2 kimw domain users 8192 May 15 03:16 inference_output_2020.05.15_02.51

lrwxrwxrwx. 1 kimw domain users 69 May 15 02:51 inference_output_latest -> /data/kimw/ATAC/ATAC_2019/tutorial2/inference_output_2020.05.15_02.51

----------. 1 kimw domain users 667 May 14 06:18 infer_config.yaml

d---------. 2 kimw domain users 8192 May 14 01:44 configs

----------. 1 kimw domain users 169312687 May 12 03:53 NK.50_cells.h5

d---------. 2 kimw domain users 8192 May 12 03:46 intervals

d---------. 2 kimw domain users 8192 May 12 03:41 models

----------. 1 kimw domain users 8906319 Feb 18 06:08 dsc.1.NK.50.cutsites.smoothed.200.bw

2) For second issue, for AtacWorks, I set the multiple GPUs: 32G as below:

kimw@compute1-exec-210:/data/kimw/ATAC/ATAC_2019/tutorial2$ nvidia-smi

Fri May 15 03:32:22 2020

+-----------------------------------------------------------------------------+

NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Tesla V100-SXM2... Off | 00000000:18:00.0 Off | Off |

| N/A 29C P0 41W / 300W | 0MiB / 32480MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 1 Tesla V100-SXM2... Off | 00000000:3B:00.0 Off | Off |

| N/A 28C P0 41W / 300W | 0MiB / 32480MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 2 Tesla V100-SXM2... Off | 00000000:86:00.0 Off | Off |

| N/A 28C P0 40W / 300W | 0MiB / 32480MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 3 Tesla V100-SXM2... Off | 00000000:AF:00.0 Off | Off |

| N/A 31C P0 40W / 300W | 0MiB / 32480MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: GPU Memory |

GPU PID Type Process name Usage

|=============================================================================|

No running processes found

+-----------------------------------------------------------------------------+

However, under the above setting, the following command with 'gpu: "None" in infer_config.yaml actually didn't work. As you recommended to try gpu: 0, I tried it and knew the command with 'gpu: 0' at least worked even though I got only a bedgraph file. It looks like AtacWorks doesn't use multiple GPUs in my environment? Would this issue associated with #1) question? Just FYI.

Thanks.

Best,

Wookyung

2020년 5월 14일 (목) 오후 5:51, ntadimeti notifications@github.com님이 작성:

Hi @wookyung https://github.com/wookyung the permissions to the inference_output_2020.05.14_05.54 folder has only admin rights. By default os.makedirs should create a folder with 777 permissions, but on some systems the mode is ignored. Here's a detailed explanation : https://docs.python.org/3/library/os.html

For now, you could add a os.chmod call explicitly after the os.makedirs() is called, that should resolve the issue you are facing : https://docs.python.org/3/library/os.html#os.chmod

I will look into it and work on a more permanent fix. Also, will figure out if there's a more straight forward way for you to fix this.

Thanks for your patience.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-628926007, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTJ26FS57RN6UH7MRS3RRRYVHANCNFSM4NAJHTYQ .

ntadimeti commented 4 years ago

@wookyung unfortunately, the permissions issue sounds like it's unique to your setup. Could you verify if the jobs you are running also have proper permissions to write to the destination ?

Regarding the multi-GPU inference, AtacWorks is capable of multi-GPU inference. You can either add --distributed to your inference command or set distributed: True in the configs/infer_config.yaml`. If that flag is not passed, then AtacWorks runs on single GPU.

Hope this helps.

ntadimeti commented 4 years ago

@wookyung , we identified a bug where the custom config files are not being picked up. This explains a lot of the problems that you are seeing as well. I am creating an issue for it and feel free to track it. You can checkout the dev-0.3.0 branch when these bug fixes are merged. Thanks for your patience.

feefee20 commented 4 years ago

Thanks for figuring out the issue. Let me know try it when you are done with fixing it.

Best, Wookyung

On Mon, May 18, 2020 at 1:47 PM ntadimeti notifications@github.com wrote:

@wookyung https://github.com/wookyung , we identified a bug where the custom config files are not being picked up. This explains a lot of the problems that you are seeing as well. I am creating an issue for it and feel free to track it. You can checkout the dev-0.3.0 branch when these bug fixes are merged. Thanks for your patience.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-630370302, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTOPS2YWG6IPT2WHS6DRSF7EJANCNFSM4NAJHTYQ .

ntadimeti commented 4 years ago

In the meantime, if you'd like to try, you should be able to edit the default config files under $atacworks/configs/ folder and scripts should run as expected.

ntadimeti commented 4 years ago

Closing this, the open issue #155 should address these problems.

feefee20 commented 4 years ago

[image: image.gif] [image: image.gif] [image: image.gif] [image: image.gif] Hello @ntadimeti,

Thanks for updating the script. I tried the docker version (claraomics/atacworks; that is the one you recently updated, right?) again but I am sorry I got the errors again. Also I am not sure if it is our system issue or command issue ( If there seems to be the system issue, please let me know. I will ask the sever administrator).

Even though I set 4GPUs in my environment and checked it properly set in my environment using 'nvidia-smi', multiple GPUs with -gnu "None" (that is set in infer_config.yaml) still didn't work with the error: main.py infer: error: argument --gpu: invalid int value: 'GPU'. In my environment, main.py looked worked with -gnu 0 (I still don't know why our multiple GPUs don't work with your command) and in the middle of processes, the following errors came up. Please let me know if you have any ideas or comments for me. Thanks. -Wookyung

export LSF_DOCKER_VOLUMES='/storage1/fs1/yooa/Active/Wookyung:/data/kimw/'

export LSF_DOCKER_NETWORK=host

export LSF_DOCKER_IPC=host

export LSF_DOCKER_SHM_SIZE=3g

bsub -G compute-yooa -Is -q general-interactive -gpu "num=4:gmodel=TeslaV100_SXM2_32GB" -R 'rusage[mem=124GB]' -M 124GB -a 'docker(claraomics/atacworks)' /bin/bash

kimw@compute1-exec-202:/data/kimw/ATAC/ATAC_2019/tutorial2$ python $atacworks/scripts/main.py infer --files NK.50_cells.h5 --sizes_file $atacworks/data/reference/hg19.auto.sizes --config configs/infer_config.yaml --config_mparams configs/model_structure.yaml

INFO:2020-05-28 18:14:16,816:AtacWorks-main] Checkng input files for compatibility

Building model: resnet ...

Loading model weights from ./models/model.pth.tar...

INFO:2020-05-28 18:14:24,146:AtacWorks-model_utils] Compiling model in DistributedDataParallel

Finished loading.

INFO:2020-05-28 18:14:24,261:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-05-28 18:14:24,276:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-05-28 18:14:24,311:AtacWorks-model_utils] Compiling model in DistributedDataParallel

Finished building.

Inference -------------------- [ 0/29]

Inference time taken: 121.639s (Load 4.267s,Prediction 117.123s)

INFO:2020-05-28 18:16:29,393:AtacWorks-main] Waiting for writer to finish...

Process Process-2:

Traceback (most recent call last):

File "/usr/lib/python3.6/shutil.py", line 550, in move

os.rename(src, real_dst)

OSError: [Errno 18] Invalid cross-device link: '/tmp/271740.tmpdir/tmpuh2auv7z/0/00001' -> './inference_output_2020.05.28_18.14/NK_inferred.track.bedGraph'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap

self.run()

File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run

self._target(*self._args, **self._kwargs)

File "/AtacWorks/scripts/main.py", line 275, in writer

shutil.move(files[0], outfiles[channel])

File "/usr/lib/python3.6/shutil.py", line 564, in move

copy_function(src, real_dst)

File "/usr/lib/python3.6/shutil.py", line 264, in copy2

copystat(src, dst, follow_symlinks=follow_symlinks)

File "/usr/lib/python3.6/shutil.py", line 229, in copystat

_copyxattr(src, dst, follow_symlinks=follow)

File "/usr/lib/python3.6/shutil.py", line 165, in _copyxattr

os.setxattr(dst, name, value, follow_symlinks=follow_symlinks)

PermissionError: [Errno 13] Permission denied: './inference_output_2020.05.28_18.14/NK_inferred.track.bedGraph'

Saving config file to ./infer_config.yaml...

2020년 5월 18일 (월) 오후 2:00, ntadimeti notifications@github.com님이 작성:

In the meantime, if you'd like to try, you should be able to edit the default config files under $atacworks/configs/ folder and scripts should run as expected.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-630376828, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTMXPDI4YSETJDYB723RSGAS7ANCNFSM4NAJHTYQ .

ntadimeti commented 4 years ago

@wookyung, We haven't uploaded the new 0.2.3 release docker container on the docker hub yet. That still has a previous snapshot of the atacworks. I will upload the new container and update on this thread.

You can also use Dockerfile checked into the repository and build the latest snapshot of AtacWorks anytime.

feefee20 commented 4 years ago

Thanks. I will try it when you upload the recent version. -Wookyung

On Fri, May 29, 2020 at 8:20 AM ntadimeti notifications@github.com wrote:

@wookyung https://github.com/wookyung, We haven't uploaded the new 0.2.3 release docker container on the docker hub yet. That still has a previous snapshot of the atacworks. I will upload the new container and update on this thread.

You can also use Dockerfile checked into the repository and build the latest snapshot of AtacWorks anytime.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-635968629, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTMUTBGEAXH7D26BL43RT6ZCRANCNFSM4NAJHTYQ .

ntadimeti commented 4 years ago

@wookyung Updated the latest docker image. Please try again. I will also try it on my end and see if I can reproduce your error. -- Use the claraomics/atacworks:latest or claraomics/atacworks: v0.2.3.

ntadimeti commented 4 years ago

@wookyung : the current docker image clones dev-v0.3.0 branch by default. Please do git checkout master before running any other commands. We will fix this soon.

ntadimeti commented 4 years ago

Fixed the branch issue. You should now be able to run on master branch directly

feefee20 commented 4 years ago

Thanks. Which one? claraomics/atacworks: v0.3.0?

2020년 5월 29일 (금) 오후 12:12, ntadimeti notifications@github.com님이 작성:

Fixed the branch issue. You should now be able to run on master branch directly

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-636085659, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTI5KWY5TN6UXZPBSADRT7UIBANCNFSM4NAJHTYQ .

ntadimeti commented 4 years ago

claraomics/atacworks:latest should work just fine.

feefee20 commented 4 years ago

Could you confirm the version of that? I tried claraomics/atacworks: v0.3.0, claraomics/atacworks: v0.3, claraomics/atacworks: dev-v0.3.0 etc. but it didn't work.

2020년 5월 29일 (금) 오후 12:33, Wookyung Kim kimwk2011@gmail.com님이 작성:

Thanks. Which one? claraomics/atacworks: v0.3.0?

2020년 5월 29일 (금) 오후 12:12, ntadimeti notifications@github.com님이 작성:

Fixed the branch issue. You should now be able to run on master branch directly

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-636085659, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTI5KWY5TN6UXZPBSADRT7UIBANCNFSM4NAJHTYQ .

ntadimeti commented 4 years ago

You don't need a version. latest is the tag name. Try claraomics/atacworks:latest as is. No need to replace latest with anything else.

feefee20 commented 4 years ago

Thanks for confirming. With your tutorial2 test files and the latest version (actually the command to check a version; git checkout master didn't work in my system like 'fatal: not a git repository (or any of the parent directories): .git'), I tried the following commands and got the same errors. To summarize the errors: 1) atacworks doesn't work in out multiple GPUs system; does work after changing -gpu "None' to -gpu 0; 2) main .py from the latest version of Dockerfile made the bedgraph file (-rw-r--r--. 1 kimw domain users 1103228048 May 29 20:49 NK_inferred.track.bedGraph) but there was an error on making a temp folder (please see the red-colored below)? Would you please fix it?

BTW, our system has had an wildcard () issue before, which in the script is recognized as just a letter. I don't know the current issue is related with that. Just FYI.

Thanks. Regards

[kimw@compute1-client-1 ~]$ cat start_atacworks.sh

export LSF_DOCKER_VOLUMES='/storage1/fs1/yooa/Active/Wookyung:/data/kimw/'

export LSF_DOCKER_NETWORK=host

export LSF_DOCKER_IPC=host

export LSF_DOCKER_SHM_SIZE=3g

bsub -G compute-yooa -Is -q general-interactive -gpu "num=4:gmodel=TeslaV100_SXM2_32GB" -R 'rusage[mem=124GB]' -M 124GB -a 'docker(claraomics/atacworks:latest)' /bin/bash

[kimw@compute1-client-1 ~]$ bash start_atacworks.sh

Job <272612> is submitted to queue .

<<Waiting for dispatch ...>>

<>

latest: Pulling from claraomics/atacworks

Digest: sha256:fcc1f662e3d2e51d476d60ed551146a2de2b50dab7bebccc148f2d0a056d8584

Status: Image is up to date for claraomics/atacworks:latest

docker.io/claraomics/atacworks:latest

kimw@compute1-exec-202:~$ cd /data/kimw/ATAC/ATAC_2019/tutorial2

kimw@compute1-exec-202:/data/kimw/ATAC/ATAC_2019/tutorial2$ atacworks=/AtacWorks

kimw@compute1-exec-202:/data/kimw/ATAC/ATAC_2019/tutorial2$ python $atacworks/scripts/main.py infer --files NK.50_cells.h5 --sizes_file $atacworks/data/reference/hg19.auto.sizes --config configs/infer_config.yaml --config_mparams configs/model_structure.yaml

INFO:2020-05-29 20:24:04,658:AtacWorks-main] Checkng input files for compatibility

Building model: resnet ...

Loading model weights from ./models/model.pth.tar...

INFO:2020-05-29 20:24:11,842:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-05-29 20:24:11,862:AtacWorks-model_utils] Compiling model in DistributedDataParallel

Finished loading.

INFO:2020-05-29 20:24:11,911:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-05-29 20:24:11,985:AtacWorks-model_utils] Compiling model in DistributedDataParallel

Finished building.

Inference -------------------- [ 0/29]

Inference time taken: 120.644s (Load 3.528s,Prediction 116.857s)

INFO:2020-05-29 20:26:15,418:AtacWorks-main] Waiting for writer to finish...

Process Process-2:

Traceback (most recent call last):

File "/usr/lib/python3.6/shutil.py", line 550, in move

os.rename(src, real_dst)

OSError: [Errno 18] Invalid cross-device link: '/tmp/272612.tmpdir/tmph0cuqndq/0/00001' -> './inference_output_2020.05.29_20.24/NK_inferred.track.bedGraph'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap

self.run()

File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run

self._target(*self._args, **self._kwargs)

File "/AtacWorks/scripts/main.py", line 271, in writer

shutil.move(files[0], outfiles[channel])

File "/usr/lib/python3.6/shutil.py", line 564, in move

copy_function(src, real_dst)

File "/usr/lib/python3.6/shutil.py", line 264, in copy2

copystat(src, dst, follow_symlinks=follow_symlinks)

File "/usr/lib/python3.6/shutil.py", line 229, in copystat

_copyxattr(src, dst, follow_symlinks=follow)

File "/usr/lib/python3.6/shutil.py", line 165, in _copyxattr

os.setxattr(dst, name, value, follow_symlinks=follow_symlinks)

PermissionError: [Errno 13] Permission denied: './inference_output_2020.05.29_20.24/NK_inferred.track.bedGraph'

2020년 5월 29일 (금) 오후 12:45, ntadimeti notifications@github.com님이 작성:

You don't need a version. latest is the tag name.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-636100481, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTMNCRWDABSTQMXVQB3RT7YD5ANCNFSM4NAJHTYQ .

ntadimeti commented 4 years ago

Hi wookyung, are you running the docker interactively ? I'd love to debug this as a live session with you. I have been able to run both tutorials without problem through the uploaded docker image claraomics/atacworks:latest. I wonder if it's some configuration problems on your side. Let's try commands one at a time and attempt to resolve them. I will paste a series of executable commands,it would be great if you can report back if those work as expected. Thank you.

ntadimeti commented 4 years ago

@wookyung The "not a git repo" error you are seeing is likely because you ran the command in the wrong directory. You have to run it in the /AtacWorks dir.

Here's my commands to run the docker image interactively:

# Run docker interactively
docker run --gpus all --shm-size 2G -it claraomics/atacworks:latest
# enter AtacWorks dir
cd /AtacWorks
# Check the branch -- should be *master
git branch

Can you report if these commands run fine for you ? What is the output of git branch ?

feefee20 commented 4 years ago

Hi @ntadimeti,

Sorry for a late reply. Here's the result with the commands that you recommended. I am still getting the permission error. Please let me know if you have any ideas. I would be available for a meeting with you guys on Friday. We normally use Zoom. I can invite you if you want. Thanks! -Wookyung

[kimw@compute1-client-1 ~]$ cat start_atacworks.sh

export LSF_DOCKER_VOLUMES='/storage1/fs1/yooa/Active/Wookyung:/data/kimw/'

export LSF_DOCKER_NETWORK=host

export LSF_DOCKER_IPC=host

export LSF_DOCKER_SHM_SIZE=4g

bsub -G compute-yooa -Is -q general-interactive -gpu "num=4:gmodel=TeslaV100_SXM2_32GB" -R 'rusage[mem=256GB]' -M 256GB -a 'docker(claraomics/atacworks:latest)' /bin/bash

[kimw@compute1-client-1 ~]$ bash start_atacworks.sh

Job <287107> is submitted to queue .

<<Waiting for dispatch ...>>

<>

latest: Pulling from claraomics/atacworks

7ddbc47eeb70: Already exists

c1bbdc448b72: Already exists

8c3b70e39044: Already exists

45d437916d57: Already exists

d8f1569ddae6: Already exists

85386706b020: Already exists

ee9b457b77d0: Already exists

be4f3343ecd3: Pull complete

51f6bbaddf34: Pull complete

02268f40466b: Pull complete

eba2dd78a0c3: Pull complete

f7c0d1730a36: Pull complete

68a06f16a7c2: Pull complete

133208334475: Pull complete

1e5b7c146bdc: Pull complete

1b28df29435e: Pull complete

Digest: sha256:fcc1f662e3d2e51d476d60ed551146a2de2b50dab7bebccc148f2d0a056d8584

Status: Downloaded newer image for claraomics/atacworks:latest

docker.io/claraomics/atacworks:latest

kimw@compute1-exec-219:~$ cd /AtacWorks

kimw@compute1-exec-219:/AtacWorks$ git branch

kimw@compute1-exec-219:/AtacWorks$ git branch master

fatal: A branch named 'master' already exists.

2020년 6월 1일 (월) 오전 10:03, ntadimeti notifications@github.com님이 작성:

@wookyung https://github.com/wookyung Here's my commands to run the docker image interactively:

Run docker interactively

docker run --gpus all --it claraomics/atacworks:latest

enter AtacWorks dir

cd /AtacWorks

Check the branch -- should be *master

git branch

Can you report if these commands run fine for you ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-636909966, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTPLUQFUZWQQABAAOXDRUO7K5ANCNFSM4NAJHTYQ .

ntadimeti commented 4 years ago

Thanks @wookyung. You have the right branch. Could you please do a fresh run of ALL steps of tutorial2 and confirm that on step7, you see the following message : Compiling model in DistributedDataParallel N times where N is the number of GPUs on your system ? You should no longer see --gpu: invalid int value: None error. Please report back on this. Here's the tutorial2 you can follow : https://github.com/clara-parabricks/AtacWorks/blob/master/tutorials/tutorial2.md

Once you confirm this, we can look into the next error you are facing, the issue with tmp dirs.

feefee20 commented 4 years ago

Hi @ntadimeti,

Thanks for your reply. I tried to newly download files required and interactively followed all steps of tutorial2 and finally got the following errors (please see the errors in the bottom of the mail). Good news is I don't see '--gpu: invalid int value: None' any longer and bad news is I don't see 'Distributing to GPUS', too. So would that mean my GPUs were not properly set? With nvidia-smi command, I got the information as below. Thanks. -Wookyung

kimw@compute1-exec-210:/data/kimw/ATAC/ATAC_2019/tutorial2$ nvidia-smi

Fri Jun 5 03:51:08 2020

+-----------------------------------------------------------------------------+

NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Tesla V100-SXM2... Off | 00000000:18:00.0 Off | Off |

| N/A 29C P0 41W / 300W | 0MiB / 32480MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 1 Tesla V100-SXM2... Off | 00000000:3B:00.0 Off | Off |

| N/A 28C P0 40W / 300W | 0MiB / 32480MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 2 Tesla V100-SXM2... Off | 00000000:86:00.0 Off | Off |

| N/A 27C P0 40W / 300W | 0MiB / 32480MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 3 Tesla V100-SXM2... Off | 00000000:AF:00.0 Off | Off |

| N/A 31C P0 40W / 300W | 0MiB / 32480MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: GPU Memory |

GPU PID Type Process name Usage

|=============================================================================|

No running processes found

+-----------------------------------------------------------------------------+

kimw@compute1-exec-210:/data/kimw/ATAC/ATAC_2019/tutorial2$ python $atacworks/scripts/main.py infer \

--files NK.50_cells.h5 \

--sizes_file $atacworks/data/reference/hg19.auto.sizes \

--config configs/infer_config.yaml \

--config_mparams configs/model_structure.yaml \

INFO:2020-06-05 03:06:29,318:AtacWorks-main] Checkng input files for compatibility

Building model: resnet ...

Loading model weights from ./models/model.pth.tar...

INFO:2020-06-05 03:06:36,883:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-06-05 03:06:36,904:AtacWorks-model_utils] Compiling model in DistributedDataParallel

Finished loading.

INFO:2020-06-05 03:06:36,927:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-06-05 03:06:36,985:AtacWorks-model_utils] Compiling model in DistributedDataParallel

Finished building.

Inference -------------------- [ 0/29]

Inference time taken: 120.979s (Load 4.154s,Prediction 116.588s)

INFO:2020-06-05 03:08:39,432:AtacWorks-main] Waiting for writer to finish...

Process Process-2:

Traceback (most recent call last):

File "/usr/lib/python3.6/shutil.py", line 550, in move

os.rename(src, real_dst)

OSError: [Errno 18] Invalid cross-device link: '/tmp/289215.tmpdir/tmp4szaz1hb/0/00001' -> './inference_output_2020.06.05_03.06/NK_inferred.track.bedGraph'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap

self.run()

File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run

self._target(*self._args, **self._kwargs)

File "/AtacWorks/scripts/main.py", line 271, in writer

shutil.move(files[0], outfiles[channel])

File "/usr/lib/python3.6/shutil.py", line 564, in move

copy_function(src, real_dst)

File "/usr/lib/python3.6/shutil.py", line 264, in copy2

copystat(src, dst, follow_symlinks=follow_symlinks)

File "/usr/lib/python3.6/shutil.py", line 229, in copystat

_copyxattr(src, dst, follow_symlinks=follow)

File "/usr/lib/python3.6/shutil.py", line 165, in _copyxattr

os.setxattr(dst, name, value, follow_symlinks=follow_symlinks)

PermissionError: [Errno 13] Permission denied: './inference_output_2020.06.05_03.06/NK_inferred.track.bedGraph'

Saving config file to ./infer_config.yaml...

2020년 6월 4일 (목) 오전 10:50, ntadimeti notifications@github.com님이 작성:

Thanks @wookyung https://github.com/wookyung. You have the right branch. Could you please do a fresh run of ALL steps of tutorial2 and confirm that on step7, you see the following message : Distributing to

GPUS where N is the number of GPUs on your system ? You should no longer see --gpu: invalid int value: None error. Please report back on this. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or unsubscribe .
ntadimeti commented 4 years ago

@wookyung -- It was my bad earlier. Program does not output "Distributing to N GPUs". What you're seeing is the correct behavior. So, you are now successfully able to launch multi-GPU inference run. Yay!

Now, if you want to run on single GPU, you will have to make the following config changes inside the tutorial2 directory:

vi config/infer_config.yaml

Replace gpu: "None" to gpu: 0. Where 0 is the gpu-ID. If you want to use a different GPU, you can replace 0 with 1or 2 or 3 -- assuming you have 4 GPUs. Replace distributed: True to distributed: False.

ntadimeti commented 4 years ago

Now, let's address your final error "invalid cross-devide link". One of the prime suspects why it's happening is that the source and destination are on different file systems. To verify this, can you run the following commands inside your interactive docker container and report the output back ?

df /tmp/

Please copy paste the output from above command

 df /data/kimw/ 

Please copy paste the output from above command.

Once I see the output, we can confirm the suspicion. I have ideas on how to fix it, I just need to make sure we are fixing the right problem.

feefee20 commented 4 years ago

Thanks! I'm so glad to fix one of the errors. Thanks to you. To be clear, if I want to use 4 GPUs, do I have to replace "gpu: None" with "gpu: 0,1,2,3" or "gpu: 3" and "distributed: True" as well because I am using multiple GPUs? Thanks.

2020년 6월 5일 (금) 오전 9:55, ntadimeti notifications@github.com님이 작성:

@wookyung https://github.com/wookyung -- It was my bad earlier. Program does not output "Distributing to N GPUs". What you're seeing is the correct behavior. So, you are now successfully able to launch multi-GPU inference run. Yay!

Now, if you want to run on single GPU, you will have to make the following config changes inside the tutorial2 directory:

vi config/infer_config.yaml

Replace gpu: "None" to gpu: 0. Where 0 is the gpu-ID. If you want to use a different GPU, you can replace 0 with 1or 2 or 3 -- assuming you have 4 GPUs. Replace distributed: True to distributed: False.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-639545928, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTIW23A7ET57B277TNDRVEBMJANCNFSM4NAJHTYQ .

feefee20 commented 4 years ago

Sorry, I set to use 4 GPUs and I can see I have gpu ID from 0 through 3.

2020년 6월 5일 (금) 오후 12:37, Wookyung Kim kimwk2011@gmail.com님이 작성:

Thanks! I'm so glad to fix one of the errors. Thanks to you. To be clear, if I want to use 4 GPUs, do I have to replace "gpu: None" with "gpu: 0,1,2,3" or "gpu: 3" and "distributed: True" as well because I am using multiple GPUs? Thanks.

2020년 6월 5일 (금) 오전 9:55, ntadimeti notifications@github.com님이 작성:

@wookyung https://github.com/wookyung -- It was my bad earlier. Program does not output "Distributing to N GPUs". What you're seeing is the correct behavior. So, you are now successfully able to launch multi-GPU inference run. Yay!

Now, if you want to run on single GPU, you will have to make the following config changes inside the tutorial2 directory:

vi config/infer_config.yaml

Replace gpu: "None" to gpu: 0. Where 0 is the gpu-ID. If you want to use a different GPU, you can replace 0 with 1or 2 or 3 -- assuming you have 4 GPUs. Replace distributed: True to distributed: False.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-639545928, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTIW23A7ET57B277TNDRVEBMJANCNFSM4NAJHTYQ .

ntadimeti commented 4 years ago

@wookyung

Short answer : No, you do not need to set gpu:3 or gpu:0,1,2,3 for multi-GPU setting. You can leave gpu: None and set distributed: True for multi-GPU.

Long answer: the gpu parameter in the config file is ONLY applicable in a single GPU case. It is irrelevant for a multi-GPU use case currently.

Currently, if you set distributed: True, atacworks inference will run on ALL the available GPUs on the system. Currently, we don't support using a subset of GPUs for multi-GPU setting. The only way you can use a subset is by setting "CUDA_VISIBLE_DEVICES" to the desired number of GPUs.

For single GPU setting, you can choose which GPU to run inference on. You can do this by setting the gpuID to one of the GPUs on your system.

Hope this clarifies.

I understand completely how this is confusing. I will create an issue for this and make it more intuitive and clear for future customers. We value your feedback a lot, thanks for working very closely with us.

feefee20 commented 4 years ago

Hi @ntadimeti,

Thanks for confirming. Here's result with the commands that you suggested. Let me know if you have good ideas. Thanks!

kimw@compute1-exec-215:/data/kimw/ATAC/ATAC_2019/tutorial2$ df /tmp/

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/mapper/vg_compute1--exec--215--pxe-root 232150272 57104168 175046104 25% /tmp

kimw@compute1-exec-215:/data/kimw/ATAC/ATAC_2019/tutorial2$ df /data/kimw/

Filesystem 1K-blocks Used Available Use% Mounted on

cache1-fs1 107374182400 7015333888 100358848512 7% /data/kimw

2020년 6월 5일 (금) 오전 10:01, ntadimeti notifications@github.com님이 작성:

Now, let's address your final error "invalid cross-devide link". One of the prime suspects why it's happening is that the source and destination are on different file systems. To verify this, can you run the following commands inside your interactive docker container and report the output back ?

df /tmp/

Please copy paste the output from above command

df /data/kimw/

Please copy paste the output from above command.

Once I see the output, we can confirm the suspicion. I have ideas on how to fix it, I just need to make sure we are fixing the right problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-639550262, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTIVYNSB4BPXHEHEHQLRVECGNANCNFSM4NAJHTYQ .

ntadimeti commented 4 years ago

@wookyung

My suspicion is confirmed. The /tmp and /data/ dirs are on different file systems. I have a PR that should address this . Please follow the commands below and report back if it fixes the invalid cross device link error.

Start the interactive docker as always and run the commands inside the docker container.

cd /AtacWorks
git remote add ntadimeti https://github.com/ntadimeti/AtacWorks.git
git fetch ntadimeti
git checkout -b crossdevice_link ntadimeti:ntadimeti/invalid_cross_device_link

After the above steps, you will be on the branch where I've made the changes that should get rid of that error.

Follow tutorial2 steps and please report back the output you are seeing.

feefee20 commented 4 years ago

I appreciate your help. I tried your commands. I am sorry but I got the following errors.

[kimw@compute1-client-1 ~]$ cat start_atacworks.sh

export LSF_DOCKER_VOLUMES='/storage1/fs1/yooa/Active/Wookyung:/data/kimw/'

export LSF_DOCKER_NETWORK=host

export LSF_DOCKER_IPC=host

export LSF_DOCKER_SHM_SIZE=4g

bsub -G compute-yooa -Is -q general-interactive -gpu "num=4:gmodel=TeslaV100_SXM2_32GB" -R 'rusage[mem=256GB]' -M 256GB -a 'docker(claraomics/atacworks:latest)' /bin/bash

[kimw@compute1-client-1 ~]$ bash start_atacworks.sh

Job <289969> is submitted to queue .

<<Waiting for dispatch ...>>

<>

latest: Pulling from claraomics/atacworks

7ddbc47eeb70: Already exists

c1bbdc448b72: Already exists

8c3b70e39044: Already exists

45d437916d57: Already exists

d8f1569ddae6: Already exists

85386706b020: Already exists

ee9b457b77d0: Already exists

be4f3343ecd3: Already exists

51f6bbaddf34: Already exists

02268f40466b: Pull complete

eba2dd78a0c3: Pull complete

f7c0d1730a36: Pull complete

68a06f16a7c2: Pull complete

133208334475: Pull complete

1e5b7c146bdc: Pull complete

1b28df29435e: Pull complete

Digest: sha256:fcc1f662e3d2e51d476d60ed551146a2de2b50dab7bebccc148f2d0a056d8584

Status: Downloaded newer image for claraomics/atacworks:latest

docker.io/claraomics/atacworks:latest

kimw@compute1-exec-203:~$ cd /AtacWorks

kimw@compute1-exec-203:/AtacWorks$ git remote add ntadimeti https://github.com/ntadimeti/AtacWorks.git

error: could not lock config file .git/config: Permission denied

fatal: could not set 'remote.ntadimeti.url' to ' https://github.com/ntadimeti/AtacWorks.git'

kimw@compute1-exec-203:/AtacWorks$ git fetch ntadimeti

error: cannot open .git/FETCH_HEAD: Permission denied

kimw@compute1-exec-203:/AtacWorks$ git checkout -b crossdevice_link ntadimeti:ntadimeti/invalid_cross_device_link

fatal: 'ntadimeti:ntadimeti/invalid_cross_device_link' is not a commit and a branch 'crossdevice_link' cannot be created from it

2020년 6월 5일 (금) 오후 4:36, ntadimeti notifications@github.com님이 작성:

@wookyung https://github.com/wookyung

My suspicion is confirmed. The /tmp and /data/ dirs are on different file systems. I have a PR that should address this #166 https://github.com/clara-parabricks/AtacWorks/pull/166 . Please follow the commands below and report back if it fixes the invalid cross device link error.

Start the interactive docker as always

cd /AtacWorks git remote add ntadimeti https://github.com/ntadimeti/AtacWorks.git git fetch ntadimeti git checkout -b crossdevice_link ntadimeti:ntadimeti/invalid_cross_device_link

After the above steps, you will be on the branch where I've made the changes that should get rid of that error.

Follow tutorial2 steps and please report back the output you are seeing.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-639834580, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTJK5E7I5TI5AMWYVPLRVFQPPANCNFSM4NAJHTYQ .

ntadimeti commented 4 years ago

@wookyung Looks like you need root permissions to change those files. To make it even easier for you, I will push another docker image with the needed changes. I will let you know once the docker image is pushed.

feefee20 commented 4 years ago

Thank you so much. I will look forward to hearing from you.

Best, Wookyung

2020년 6월 8일 (월) 오후 3:36, ntadimeti notifications@github.com님이 작성:

@wookyung https://github.com/wookyung Looks like you need root permissions to change those files. To make it even easier for you, I will push another docker image with the needed changes. I will let you know once the docker image is pushed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-640872542, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTJRP5LLK6HTMAXDVBDRVVDVXANCNFSM4NAJHTYQ .

ntadimeti commented 4 years ago

@wookyung : Could you try pulling the docker image claraomics/atacworks:test and follow tutorial2. If my fix works, you shouldn't see the invalid cross device link error anymore

feefee20 commented 4 years ago

Hello,

I didn’t notice that you emailed me. Sorry about that. I will get back to tou once I tried it. Thanks!

-Wookyung

On Wed, Jun 10, 2020 at 10:55 AM ntadimeti notifications@github.com wrote:

@wookyung https://github.com/wookyung : Could you try pulling the docker image claraomics/atacworks:test and follow tutorial2. If my fix works, you shouldn't see the invalid cross device link error anymore

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-642100607, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTMBEMWZ2EIJSYUYEILRV6UFNANCNFSM4NAJHTYQ .

feefee20 commented 4 years ago

Hello,

I wanted to try tutorial2 with the docker image claraomics/atacworks:test updated. Does the link ( https://github.com/clara-parabricks/AtacWorks/blob/dev-v0.3.0/tutorials/tutorial2.md) have a problem? I couldn't access it. Thanks.

Wookyung

2020년 6월 15일 (월) 오전 11:04, Wookyung Kim kimwk2011@gmail.com님이 작성:

Hello,

I didn’t notice that you emailed me. Sorry about that. I will get back to tou once I tried it. Thanks!

-Wookyung

On Wed, Jun 10, 2020 at 10:55 AM ntadimeti notifications@github.com wrote:

@wookyung https://github.com/wookyung : Could you try pulling the docker image claraomics/atacworks:test and follow tutorial2. If my fix works, you shouldn't see the invalid cross device link error anymore

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-642100607, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTMBEMWZ2EIJSYUYEILRV6UFNANCNFSM4NAJHTYQ .

ntadimeti commented 4 years ago

Hi wookyung, please use this link for the tutorials in master branch : https://github.com/clara-parabricks/AtacWorks/blob/master/tutorials/tutorial2.md

The link you posted is for the branch dev-v0.3.0. It's a branch that's under rapid development and it's not stable for users. :)

feefee20 commented 4 years ago

Hi @ntadimeti,

I'm glad to tell you great news. Your test version totally works for me!

1) At Step 7 with that I alway had an issue, there was no the same issue any longer and I got the right outputs that we expect.

kimw@compute1-exec-219:/data/kimw/ATAC/ATAC_2019/tutorial2$ python $atacworks/scripts/main.py infer \

--files NK.50_cells.h5 \

--sizes_file $atacworks/data/reference/hg19.auto.sizes \

--config configs/infer_config.yaml \

--config_mparams configs/model_structure.yaml

INFO:2020-06-19 02:19:38,455:AtacWorks-main] Checkng input files for compatibility

Building model: resnet ...

Loading model weights from ./models/model.pth.tar...

Finished loading.

INFO:2020-06-19 02:19:46,147:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-06-19 02:19:47,096:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-06-19 02:19:47,116:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-06-19 02:19:52,335:AtacWorks-model_utils] Compiling model in DistributedDataParallel

Finished building.

Inference -------------------- [ 0/29]

Inference time taken: 121.083s (Load 3.939s,Prediction 116.911s)

INFO:2020-06-19 02:21:56,638:AtacWorks-main] Waiting for writer to finish...

Writing the output to bigwig files

Saving config file to ./infer_config.yaml...

kimw@compute1-exec-219:/data/kimw/ATAC/ATAC_2019/tutorial2$ ls -lt inference_output_2020.06.19_02.19

total 1257984

----------. 1 kimw domain users 2189064 Jun 19 02:46 NK_inferred.peaks.bw

-rw-r--r--. 1 kimw domain users 5854840 Jun 19 02:46 NK_inferred.peaks.bedGraph

----------. 1 kimw domain users 175573946 Jun 19 02:46 NK_inferred.track.bw

-rw-r--r--. 1 kimw domain users 1103212562 Jun 19 02:45 NK_inferred.track.bedGraph

2) I also tried Step 8 to get the peaks called and here it is:

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/tutorial2/inference_output_latest$ python $atacworks/scripts/peaksummary.py \

--peakbw NK_inferred.peaks.bw \

--trackbw NK_inferred.track.bw \

--prefix NK_inferred.peak_calls \

--out_dir . \

--minlen 20

INFO:2020-06-19 03:54:30,921:AtacWorks-peaksummary] Writing peaks to bedGraph file ./NK_inferred.peak_calls.bedGraph

INFO:2020-06-19 03:54:31,193:AtacWorks-peaksummary] Reading peaks

INFO:2020-06-19 03:54:31,260:AtacWorks-peaksummary] Calculating peak statistics

INFO:2020-06-19 03:55:30,692:AtacWorks-peaksummary] reduced number of peaks from 225182 to 26575.

INFO:2020-06-19 03:55:30,694:AtacWorks-peaksummary] Writing peaks to BED file ./NK_inferred.peak_calls.bed

INFO:2020-06-19 03:55:30,873:AtacWorks-peaksummary] Deleting bedGraph file

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/tutorial2/inference_output_latest$ ls -l

total 1260032

----------. 1 kimw domain users 1741069 Jun 19 03:55 NK_inferred.peak_calls.bed

-rw-r--r--. 1 kimw domain users 5854840 Jun 19 02:46 NK_inferred.peaks.bedGraph

----------. 1 kimw domain users 2189064 Jun 19 02:46 NK_inferred.peaks.bw

-rw-r--r--. 1 kimw domain users 1103212562 Jun 19 02:45 NK_inferred.track.bedGraph

----------. 1 kimw domain users 175573946 Jun 19 02:46 NK_inferred.track.bw

2-1) ** FYI, actually, there was a minor issue to set paths for input files for Step 8 with your original commands as below.

python $atacworks/scripts/peaksummary.py \ --peakbw inference_output_latest/NK_inferred.peaks.bw \ --trackbw inference_output_latest/NK_inferred.track.bw \ --prefix inference_output_latest/NK_inferred.peak_calls \ --out_dir inference_output_latest \ --minlen 20

Traceback (most recent call last):

File "/AtacWorks/scripts/peaksummary.py", line 95, in

peaks = read_intervals(out_bg_path)

File "/usr/local/lib/python3.6/dist-packages/claragenomics/io/bedio.py", line 32, in read_intervals

skiprows=skip)

File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 685, in parser_f

return _read(filepath_or_buffer, kwds)

File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 457, in _read

parser = TextFileReader(fp_or_buf, **kwds)

File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 895, in init

self._make_engine(self.engine)

File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1135, in _make_engine

self._engine = CParserWrapper(self.f, **self.options)

File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1906, in init

self._reader = parsers.TextReader(src, **kwds)

File "pandas/_libs/parsers.pyx", line 380, in pandas._libs.parsers.TextReader.cinit

File "pandas/_libs/parsers.pyx", line 687, in pandas._libs.parsers.TextReader._setup_parser_source

FileNotFoundError: [Errno 2] File b'inference_output_latest/inference_output_latest/NK_inferred.peak_calls.bedGraph' does not exist: b'inference_output_latest/inference_output_latest/NK_inferred.peak_calls.bedGraph'

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/tutorial2$ find inference_output_latest/inference_output_latest/NK_inferred.peak_calls.bedGraph

find: 'inference_output_latest/inference_output_latest/NK_inferred.peak_calls.bedGraph': No such file or directory

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/tutorial2$ find /inference_output_latest/NK_inferred.peak_calls.bedGraph

find: '/inference_output_latest/NK_inferred.peak_calls.bedGraph': No such file or directory

corrected:

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/tutorial2$ cd inference_output_latest

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/tutorial2/inference_output_latest$ python $atacworks/scripts/peaksummary.py \

--peakbw NK_inferred.peaks.bw \

--trackbw NK_inferred.track.bw \

--prefix NK_inferred.peak_calls \

--out_dir . \

--minlen 20

Next step is trying your tool with my data. It was a long journey. You were great! I really appreciate your kind guides and help.

3) Lastly, I have one more question. My final goal is getting differential peaks between two different conditions of samples, of course, including the improved samples. Is there any way or tool to do differential analysis with your output files? Normally, when we get DEGs or differential ChIP or ATAC peaks, we use EdgeR or Deseq2 with raw read counts (raw read counts are 'required' for these tools), starting from bam files. BTW, your output is bw or bedGraph. Actually, I don't know any ways or tools using bw or bedGraph. Do you have any ideas for that?

Thanks for your help again!

Best, Wookyung

2020년 6월 18일 (목) 오전 10:09, ntadimeti notifications@github.com님이 작성:

Hi wookyung, please use this link : https://github.com/clara-parabricks/AtacWorks/blob/master/tutorials/tutorial2.md

The above link is for the branch dev-v0.3.0. It's a branch that's under rapid development and it's not stable for users. :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-646082629, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTLHTEACOHNXGPOFUGDRXIUZFANCNFSM4NAJHTYQ .

feefee20 commented 4 years ago

Hi @ntadimeti,

I have some quick questions. Since my sequencing samples are aligned to hg38, I should prepare some files required for hg38 like an interval file. So I made the hg38.interval file and .h5 file based on the following commands.

kimw@compute1-exec-208:/data/kimw/ATAC/ATAC_2019/atacworks/tutorial2$ python $atacworks/scripts/get_intervals.py \

--sizes $atacworks/data/reference/hg38.auto.sizes \

--intervalsize 50000 \

--out_dir intervals \

--prefix hg38.50000 \

--wg

INFO:2020-06-21 02:13:01,998:AtacWorks-intervals] Generating intervals tiling across all chromosomes in sizes file: /AtacWorks/data/reference/hg38.auto.sizes

INFO:2020-06-21 02:13:02,192:AtacWorks-intervals] Done

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/atacworks/tutorial2/intervals$ ls -l

total 1536

----------. 1 kimw domain users 1372419 Jun 21 02:13 hg38.50000.genome_intervals.bed

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/atacworks/tutorial2/intervals$ head hg38.50000.genome_intervals.bed

chr1 0 50000

chr1 50000 100000

chr1 100000 150000

chr1 150000 200000

chr1 200000 250000

chr1 250000 300000

chr1 300000 350000

chr1 350000 400000

chr1 400000 450000

chr1 450000 500000

kimw@compute1-exec-208:/data/kimw/ATAC/ATAC_2019/atacworks/tutorial2$ python $atacworks/scripts/bw2h5.py \

       --noisybw

ATAC-seq_step3.2_normalized_per_10M_HN201_S1_R1_001.bigWig \

       --intervals intervals/hg38.50000.genome_intervals.bed \

       --out_dir ./ \

       --prefix test_201 \

       --pad 5000 \

       --nolabel

INFO:2020-06-21 02:14:57,123:AtacWorks-bw2h5] Reading intervals

INFO:2020-06-21 02:14:57,161:AtacWorks-bw2h5] Read 57487 intervals

INFO:2020-06-21 02:14:57,161:AtacWorks-bw2h5] Writing data in 58 batches.

INFO:2020-06-21 02:14:57,162:AtacWorks-bw2h5] Extracting data for each batch and writing to h5 file

INFO:2020-06-21 02:14:57,162:AtacWorks-bw2h5] batch 0 of 58

INFO:2020-06-21 02:15:54,324:AtacWorks-bw2h5] batch 10 of 58

INFO:2020-06-21 02:16:49,879:AtacWorks-bw2h5] batch 20 of 58

INFO:2020-06-21 02:17:47,036:AtacWorks-bw2h5] batch 30 of 58

INFO:2020-06-21 02:18:43,124:AtacWorks-bw2h5] batch 40 of 58

INFO:2020-06-21 02:19:37,464:AtacWorks-bw2h5] batch 50 of 58

INFO:2020-06-21 02:20:19,120:AtacWorks-bw2h5] Done! Saved to ./test_201.h5

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/atacworks/tutorial2$ ls -l

----------. 1 kimw domain users 577787199 Jun 21 02:20 test_201.h5

However, when main.py infer ran, I got the following errors.

1) For one of the errors, it looks like I have to modify 'infer_config.yaml': hg19.50000.genome_intervals.bed --> hg38.50000.genome_intervals.bed. Unfortunately, I couldn't edit it in 'claraomics/atacworks:test'. Would you please change the dockerfile so that I can edit .yaml file under your docker? Or can you directly provide the dockerfile?

2) For other errors, do you have any ideas?

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/atacworks/tutorial2$ python $atacworks/scripts/main.py infer \

--files test_201.h5 \

--sizes_file $atacworks/data/reference/hg38.auto.sizes \

--config configs/infer_config.yaml \

--config_mparams configs/model_structure.yaml

INFO:2020-06-21 02:45:09,969:AtacWorks-main] Checkng input files for compatibility

Traceback (most recent call last):

File "/AtacWorks/scripts/main.py", line 439, in

main()

File "/AtacWorks/scripts/main.py", line 370, in main

intervals = read_intervals(args.intervals_file)

File "/usr/local/lib/python3.6/dist-packages/claragenomics/io/bedio.py", line 32, in read_intervals

skiprows=skip)

File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 685, in parser_f

return _read(filepath_or_buffer, kwds)

File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 457, in _read

parser = TextFileReader(fp_or_buf, **kwds)

File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 895, in init

self._make_engine(self.engine)

File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1135, in _make_engine

self._engine = CParserWrapper(self.f, **self.options)

File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1906, in init

self._reader = parsers.TextReader(src, **kwds)

File "pandas/_libs/parsers.pyx", line 380, in pandas._libs.parsers.TextReader.cinit

File "pandas/_libs/parsers.pyx", line 687, in pandas._libs.parsers.TextReader._setup_parser_source

FileNotFoundError: [Errno 2] File b'./intervals/hg19.50000.genome_intervals.bed' does not exist: b'./intervals/hg19.50000.genome_intervals.bed'

3) In tutorial2, you set 50,000 for intervals of genome. Is it a common setting? For real data, what would you recommend for intervals? 1000? 500?

Thanks for your answers!

Best, Wookyung

2020년 6월 18일 (목) 오후 11:45, Wookyung Kim kimwk2011@gmail.com님이 작성:

Hi @ntadimeti,

I'm glad to tell you great news. Your test version totally works for me!

1) At Step 7 with that I alway had an issue, there was no the same issue any longer and I got the right outputs that we expect.

kimw@compute1-exec-219:/data/kimw/ATAC/ATAC_2019/tutorial2$ python $atacworks/scripts/main.py infer \

--files NK.50_cells.h5 \

--sizes_file $atacworks/data/reference/hg19.auto.sizes \

--config configs/infer_config.yaml \

--config_mparams configs/model_structure.yaml

INFO:2020-06-19 02:19:38,455:AtacWorks-main] Checkng input files for compatibility

Building model: resnet ...

Loading model weights from ./models/model.pth.tar...

Finished loading.

INFO:2020-06-19 02:19:46,147:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-06-19 02:19:47,096:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-06-19 02:19:47,116:AtacWorks-model_utils] Compiling model in DistributedDataParallel

INFO:2020-06-19 02:19:52,335:AtacWorks-model_utils] Compiling model in DistributedDataParallel

Finished building.

Inference -------------------- [ 0/29]

Inference time taken: 121.083s (Load 3.939s,Prediction 116.911s)

INFO:2020-06-19 02:21:56,638:AtacWorks-main] Waiting for writer to finish...

Writing the output to bigwig files

Saving config file to ./infer_config.yaml...

kimw@compute1-exec-219:/data/kimw/ATAC/ATAC_2019/tutorial2$ ls -lt inference_output_2020.06.19_02.19

total 1257984

----------. 1 kimw domain users 2189064 Jun 19 02:46 NK_inferred.peaks.bw

-rw-r--r--. 1 kimw domain users 5854840 Jun 19 02:46 NK_inferred.peaks.bedGraph

----------. 1 kimw domain users 175573946 Jun 19 02:46 NK_inferred.track.bw

-rw-r--r--. 1 kimw domain users 1103212562 Jun 19 02:45 NK_inferred.track.bedGraph

2) I also tried Step 8 to get the peaks called and here it is:

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/tutorial2/inference_output_latest$ python $atacworks/scripts/peaksummary.py \

--peakbw NK_inferred.peaks.bw \

--trackbw NK_inferred.track.bw \

--prefix NK_inferred.peak_calls \

--out_dir . \

--minlen 20

INFO:2020-06-19 03:54:30,921:AtacWorks-peaksummary] Writing peaks to bedGraph file ./NK_inferred.peak_calls.bedGraph

INFO:2020-06-19 03:54:31,193:AtacWorks-peaksummary] Reading peaks

INFO:2020-06-19 03:54:31,260:AtacWorks-peaksummary] Calculating peak statistics

INFO:2020-06-19 03:55:30,692:AtacWorks-peaksummary] reduced number of peaks from 225182 to 26575.

INFO:2020-06-19 03:55:30,694:AtacWorks-peaksummary] Writing peaks to BED file ./NK_inferred.peak_calls.bed

INFO:2020-06-19 03:55:30,873:AtacWorks-peaksummary] Deleting bedGraph file

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/tutorial2/inference_output_latest$ ls -l

total 1260032

----------. 1 kimw domain users 1741069 Jun 19 03:55 NK_inferred.peak_calls.bed

-rw-r--r--. 1 kimw domain users 5854840 Jun 19 02:46 NK_inferred.peaks.bedGraph

----------. 1 kimw domain users 2189064 Jun 19 02:46 NK_inferred.peaks.bw

-rw-r--r--. 1 kimw domain users 1103212562 Jun 19 02:45 NK_inferred.track.bedGraph

----------. 1 kimw domain users 175573946 Jun 19 02:46 NK_inferred.track.bw

2-1) ** FYI, actually, there was a minor issue to set paths for input files for Step 8 with your original commands as below.

python $atacworks/scripts/peaksummary.py \ --peakbw inference_output_latest/NK_inferred.peaks.bw \ --trackbw inference_output_latest/NK_inferred.track.bw \ --prefix inference_output_latest/NK_inferred.peak_calls \ --out_dir inference_output_latest \ --minlen 20

Traceback (most recent call last):

File "/AtacWorks/scripts/peaksummary.py", line 95, in

peaks = read_intervals(out_bg_path)

File "/usr/local/lib/python3.6/dist-packages/claragenomics/io/bedio.py", line 32, in read_intervals

skiprows=skip)

File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 685, in parser_f

return _read(filepath_or_buffer, kwds)

File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 457, in _read

parser = TextFileReader(fp_or_buf, **kwds)

File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 895, in init

self._make_engine(self.engine)

File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1135, in _make_engine

self._engine = CParserWrapper(self.f, **self.options)

File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1906, in init

self._reader = parsers.TextReader(src, **kwds)

File "pandas/_libs/parsers.pyx", line 380, in pandas._libs.parsers.TextReader.cinit

File "pandas/_libs/parsers.pyx", line 687, in pandas._libs.parsers.TextReader._setup_parser_source

FileNotFoundError: [Errno 2] File b'inference_output_latest/inference_output_latest/NK_inferred.peak_calls.bedGraph' does not exist: b'inference_output_latest/inference_output_latest/NK_inferred.peak_calls.bedGraph'

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/tutorial2$ find inference_output_latest/inference_output_latest/NK_inferred.peak_calls.bedGraph

find: 'inference_output_latest/inference_output_latest/NK_inferred.peak_calls.bedGraph': No such file or directory

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/tutorial2$ find /inference_output_latest/NK_inferred.peak_calls.bedGraph

find: '/inference_output_latest/NK_inferred.peak_calls.bedGraph': No such file or directory

corrected:

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/tutorial2$ cd inference_output_latest

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/tutorial2/inference_output_latest$ python $atacworks/scripts/peaksummary.py \

--peakbw NK_inferred.peaks.bw \

--trackbw NK_inferred.track.bw \

--prefix NK_inferred.peak_calls \

--out_dir . \

--minlen 20

Next step is trying your tool with my data. It was a long journey. You were great! I really appreciate your kind guides and help.

3) Lastly, I have one more question. My final goal is getting differential peaks between two different conditions of samples, of course, including the improved samples. Is there any way or tool to do differential analysis with your output files? Normally, when we get DEGs or differential ChIP or ATAC peaks, we use EdgeR or Deseq2 with raw read counts (raw read counts are 'required' for these tools), starting from bam files. BTW, your output is bw or bedGraph. Actually, I don't know any ways or tools using bw or bedGraph. Do you have any ideas for that?

Thanks for your help again!

Best, Wookyung

2020년 6월 18일 (목) 오전 10:09, ntadimeti notifications@github.com님이 작성:

Hi wookyung, please use this link : https://github.com/clara-parabricks/AtacWorks/blob/master/tutorials/tutorial2.md

The above link is for the branch dev-v0.3.0. It's a branch that's under rapid development and it's not stable for users. :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/153#issuecomment-646082629, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTLHTEACOHNXGPOFUGDRXIUZFANCNFSM4NAJHTYQ .

avantikalal commented 4 years ago

Hi @wookyung ,

For denoising genome-wide data we've found 50,000 to be a good interval size.

You should be able to edit your config file. I'll let @ntadimeti comment on that issue.

ntadimeti commented 4 years ago

@wookyung that's amazing that now you're able to run succesfully. For your next adventure of trying your own data, let's use a new issue. This one is getting very long. :) We usually prefer to keep the issues small and succint so future users can easily benefit.