NVIDIA-Genomics-Research / AtacWorks

Deep learning based processing of Atac-seq data
https://clara-parabricks.github.io/AtacWorks/
Other
128 stars 23 forks source link

Trying Atacworks on custom data #175

Closed ntadimeti closed 3 years ago

ntadimeti commented 4 years ago

Hi @ntadimeti,

I have some quick questions. Since my sequencing samples are aligned to hg38, I should prepare some files required for hg38 like an interval file. So I made the hg38.interval file and .h5 file based on the following commands.

python $atacworks/scripts/get_intervals.py \
     --sizes $atacworks/data/reference/hg38.auto.sizes \
     --intervalsize 50000 \
     --out_dir intervals \
     --prefix hg38.50000 \
     --wg
INFO:2020-06-21 02:13:01,998:AtacWorks-intervals] Generating intervals
tiling across all chromosomes             in sizes file:
/AtacWorks/data/reference/hg38.auto.sizes

INFO:2020-06-21 02:13:02,192:AtacWorks-intervals] Done
kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/atacworks/tutorial2/intervals$
ls -l

total 1536
----------. 1 kimw domain users 1372419 Jun 21 02:13
hg38.50000.genome_intervals.bed

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/atacworks/tutorial2/intervals$
head hg38.50000.genome_intervals.bed

chr1 0 50000

chr1 50000 100000

chr1 100000 150000

chr1 150000 200000

chr1 200000 250000

chr1 250000 300000

chr1 300000 350000

chr1 350000 400000

chr1 400000 450000

chr1 450000 500000
kimw@compute1-exec-208:/data/kimw/ATAC/ATAC_2019/atacworks/tutorial2$
python $atacworks/scripts/bw2h5.py \
            --noisybw
ATAC-seq_step3.2_normalized_per_10M_HN201_S1_R1_001.bigWig \
            --intervals intervals/hg38.50000.genome_intervals.bed \
            --out_dir ./ \
            --prefix test_201 \
            --pad 5000 \
            --nolabel
INFO:2020-06-21 02:14:57,123:AtacWorks-bw2h5] Reading intervals

INFO:2020-06-21 02:14:57,161:AtacWorks-bw2h5] Read 57487 intervals

INFO:2020-06-21 02:14:57,161:AtacWorks-bw2h5] Writing data in 58 batches.

INFO:2020-06-21 02:14:57,162:AtacWorks-bw2h5] Extracting data for each
batch and writing to h5 file

INFO:2020-06-21 02:14:57,162:AtacWorks-bw2h5] batch 0 of 58

INFO:2020-06-21 02:15:54,324:AtacWorks-bw2h5] batch 10 of 58

INFO:2020-06-21 02:16:49,879:AtacWorks-bw2h5] batch 20 of 58

INFO:2020-06-21 02:17:47,036:AtacWorks-bw2h5] batch 30 of 58

INFO:2020-06-21 02:18:43,124:AtacWorks-bw2h5] batch 40 of 58

INFO:2020-06-21 02:19:37,464:AtacWorks-bw2h5] batch 50 of 58

INFO:2020-06-21 02:20:19,120:AtacWorks-bw2h5] Done! Saved to ./test_201.h5

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/atacworks/tutorial2$ ls -l
----------. 1 kimw domain users 577787199 Jun 21 02:20 test_201.h5

However, when main.py infer ran, I got the following errors.

1) For one of the errors, it looks like I have to modify 'infer_config.yaml': hg19.50000.genome_intervals.bed --> hg38.50000.genome_intervals.bed. Unfortunately, I couldn't edit it in 'claraomics/atacworks:test'. Would you please change the dockerfile so that I can edit .yaml file under your docker? Or can you directly provide the dockerfile?

2) For other errors, do you have any ideas?

kimw@compute1-exec-217:/data/kimw/ATAC/ATAC_2019/atacworks/tutorial2$
python $atacworks/scripts/main.py infer \
     --files test_201.h5 \
     --sizes_file $atacworks/data/reference/hg38.auto.sizes \
     --config configs/infer_config.yaml \
     --config_mparams configs/model_structure.yaml
INFO:2020-06-21 02:45:09,969:AtacWorks-main] Checkng input files for
compatibility

Traceback (most recent call last):

  File "/AtacWorks/scripts/main.py", line 439, in <module>

    main()

  File "/AtacWorks/scripts/main.py", line 370, in main

    intervals = read_intervals(args.intervals_file)

  File "/usr/local/lib/python3.6/dist-packages/claragenomics/io/bedio.py",
line 32, in read_intervals

    skiprows=skip)

  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line
685, in parser_f

    return _read(filepath_or_buffer, kwds)

  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line
457, in _read

    parser = TextFileReader(fp_or_buf, **kwds)

  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line
895, in __init__

    self._make_engine(self.engine)

  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line
1135, in _make_engine

    self._engine = CParserWrapper(self.f, **self.options)

  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line
1906, in __init__

    self._reader = parsers.TextReader(src, **kwds)

  File "pandas/_libs/parsers.pyx", line 380, in
pandas._libs.parsers.TextReader.__cinit__

  File "pandas/_libs/parsers.pyx", line 687, in
pandas._libs.parsers.TextReader._setup_parser_source

FileNotFoundError: [Errno 2] File
b'./intervals/hg19.50000.genome_intervals.bed' does not exist:
b'./intervals/hg19.50000.genome_intervals.bed'

3) In tutorial2, you set 50,000 for intervals of genome. Is it a common setting? For real data, what would you recommend for intervals? 1000? 500?

Thanks for your answers!

Best, Wookyung

ntadimeti commented 4 years ago

@wookyung For this error :

For one of the errors, it looks like I have to modify 'infer_config.yaml':
hg19.50000.genome_intervals.bed --> hg38.50000.genome_intervals.bed.
Unfortunately, I couldn't edit it in 'claraomics/atacworks:test'. Would you
please change the dockerfile so that I can edit .yaml file under your
docker? Or can you directly provide the dockerfile?

are you able to find the config file under /data/kimw/ATAC/ATAC_2019/atacworks/tutorial2/configs/infer_config.yaml ? If you can find this file, are you able to open it using vim? Like vi /data/kimw/ATAC/ATAC_2019/atacworks/tutorial2/configs/infer_config.yaml ? If you are able to open it, can you edit it ?

ntadimeti commented 4 years ago

Let's solve one problem at a time, so we can make better progress.

feefee20 commented 4 years ago

Hi @ntadimeti,

I am happy to be able to try AtacWorks with my data finally. I appreciate your help! For editing config files, vim/vi command didn't work on the docker system. Could you let me use vim commands or other editting commands on the system? Thanks!

Best, Wookyung

2020년 6월 22일 (월) 오후 5:13, ntadimeti notifications@github.com님이 작성:

Let's solve one problem at a time, so we can make better progress.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/175#issuecomment-647795395, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTNZCH3RLZUNUAZOAGTRX7JQJANCNFSM4OFCL6RA .

ntadimeti commented 4 years ago

@wookyung You can install vim inside your docker container using apt install vim. And then you will be able to open the file using the vi command. Please let me know if the installation worked for you.

feefee20 commented 4 years ago

Hi,

In my docker with 'claraomics/atacworks:test', I couldn't install vim due to 'permission denied'. 'apt install' doesn't work in my docker. I know it is weird and annoying. So normally I do add vim in the dockerfile so that vim can be installed once the docker opens. Does it work for 'claraomics/atacworks:test' or is there any idea? Thanks.

Wookyung

2020년 6월 24일 (수) 오전 11:19, ntadimeti notifications@github.com님이 작성:

@wookyung https://github.com/wookyung You can install vim inside your docker container using apt install vim. And then you will be able to open the file using the vi command. Please let me know if the installation worked for you.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/175#issuecomment-648921225, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTKAX3LKLEURLIAXBE3RYIRPNANCNFSM4OFCL6RA .

ntadimeti commented 4 years ago

@wookyung There's two potential solutions. One is to just modify the Dockerfile and create a new image. Other option is to try running the docker container with root permissions.

Is the second option possible in your case ? Here's how you can do that:

  1. Launch the docker container with docker run command, add --name test to it.
  2. In a new terminal, run the following command : docker exec --user="root" -it test /bin/bash. This will open an interactive shell into the docker container but with root permissions.

If the above doesn't work for you, you can clone the AtacWorks repository from here: https://github.com/clara-parabricks/AtacWorks/tree/master. You will find a Dockerfile in the parent directory. Feel free to modify it, create your own containers to work with.

Let me know which option worked for you. Sorry for the delayed response.

feefee20 commented 4 years ago

Hi @ntadimeti,

I'm sorry for the late reply. I was busy with other things for a while. I realized there was big progress in the tool development. This is very COOL! It became much easier for a user like me to use. Thanks! I already tried your new version with both your tutorial file and my own data and hope I can give you good feedbacks also with a question.

As you know, I am using a docker version. There are two versions that I know: a) claraomics/atacworks (maybe latest by default) @ https://clara-parabricks.github.io/AtacWorks/Dockerfile.html; b) the one you modified for me before (and also I think you updated after that )- claraomics/atacworks:test.

1. When I tried a) and I couldn't find the path to /AtacWorks folder. However, I remembered there's a test version that I can try. The test version (b)claraomics/atacworks:test) worked well when I tried it with both your new tutorial2 and my data.

  1. For my data, I finally got the corrected bw and bedgraph outputs from the following original input file (yellow-highlighted). The output files, especially infer.track.bw and bedGraph are quite big and I'm wondering if it is common. When I compared infer.track.bw to the original bw on IGV (please see corrected infer.track.bw and bed and on the attached screenshot), it looks corrected (but I didn't compare other samples yet). Would the file size be common? If the size is common for output bw, it is quite heavy for visualization. Is there any way to get the size-reduced bw file for visualization and further uploading to GEO?

command run atacworks denoise \ --noisybw ATAC_seq_step3.2_normalized_per_10M_H501_S4_R1_001.bw \ --genome $atacworks/data/reference/hg38.auto.sizes \ --weights_path /data/kimw/ATAC/ATAC_2019/atacworks/models/model.pth.tar \ --out_home "./" \ --exp_name "atacworks_denoise" \ --distributed \ --num_workers 0

d---------. 2 kimw domain users 8192 Aug 5 22:12 intervals

d---------. 2 kimw domain users 8192 Aug 5 22:12 bw2h5

----------. 1 kimw domain users 677124524 Dec 24 2019 s3.2_10M_H501_S4_R1_001.bw

----------. 1 kimw domain users 44884372923 Aug 6 02:18 ATAC_seq_step3_infer.track.bedGraph

----------. 1 kimw domain users 8833899461 Aug 6 03:18 ATAC_seq_step3_infer.track.bw http://ATAC_seq_step3_infer.track.bw

----------. 1 kimw domain users 447589964 Aug 6 03:18 ATAC_seq_step3_infer.peaks.bedGraph

----------. 1 kimw domain users 84201999 Aug 6 03:19 ATAC_seq_step3_infer.peaks.bw

----------. 1 kimw domain users 39387398 Aug 6 11:36 step3.2_10M_H50l_peak_calls.bed

  1. After I correct my under-estimated samples by AtacWorks, my final goal is to do differential analysis by comparing two different groups. The samples which need correction belong to one of these groups. To do finally differential analysis, the normalization is required therefore it looks like all my samples should be corrected or properly fit to your model. I'd like to get your opinion.

Thanks in advance.

-Wookyung

2020년 6월 29일 (월) 오후 12:09, ntadimeti notifications@github.com님이 작성:

@wookyung https://github.com/wookyung There's two potential solutions. One is to just modify the Dockerfile and create a new image. Other option is to try running the docker container with root permissions.

Is the second option possible in your case ? Here's how you can do that:

  1. Launch the docker container with docker run command, add --name test to it.
  2. In a new terminal, run the following command : docker exec --user="root" -it test /bin/bash. This will open an interactive shell into the docker container but with root permissions.

If the above doesn't work for you, you can clone the AtacWorks repository from here: https://github.com/clara-parabricks/AtacWorks/tree/master. You will find a Dockerfile in the parent directory. Feel free to modify it, create your own containers to work with.

Let me know which option worked for you. Sorry for the delayed response.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clara-parabricks/AtacWorks/issues/175#issuecomment-651248405, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRTLKWFDQXXW7KYQ27GDRZDDE3ANCNFSM4OFCL6RA .

ntadimeti commented 4 years ago

Hello! Thanks for testing out our new version @wookyung. You can read all about atacworks Dockerfiles here: https://clara-parabricks.github.io/AtacWorks/Dockerfile.html

In short, we have two dockerfiles now. 1) claraomics/atacworks:latest -- This contains pre-installed atacworks (pip install atacworks). This image does not contain source code. Most useful for running atacworks train and denoise commands

2) claraomics/atacworks:source-latest -- This contains atacworks source code. Use this image for running tutorials etc. It should work just as well as the claraomics/atacworks:test.

Congratulations on being able to run atacworks on your data. Thanks for being patient with us.

Regarding the file size and experiment specific questions, I'll let @avantikalal get back to you.

avantikalal commented 4 years ago

@feefee20 In my experiments I typically see track.bw outputs of 300-400 MB. So this does seem quite large. Could you share the chromosome sizes file that you used for inference, and also the top few lines of the track.bedGraph output file?

ntadimeti commented 3 years ago

Closing this issue, since it is inactive. @feefee20 please feel free to open this issue or open a new issue if you need help anytime.