MiraldiLab / maxATAC

Transcription Factor Binding Prediction from ATAC-seq and scATAC-seq with Deep Neural Networks
Apache License 2.0
25 stars 8 forks source link

variants_bed invalid int value #93

Closed hrayjones closed 1 year ago

hrayjones commented 2 years ago

Hello,

I am trying to run maxatac variants and I have the following error: maxatac variants: error: argument -variants_bed/--variants_bed: invalid int value: 'variants_1based.bed'

My variants_bed file looks like this: head variants_1based.bed chr1 758351 758351 G chr1 770502 770502 A chr1 770988 770988 G chr1 785001 785001 T chr1 785910 785910 C chr1 787290 787290 C

I have tried supplying the SNPs in this bed file as 0-based or 1-based, but neither worked. I have also tried removing "chr". Please can you advise me where I might be going wrong?

Many thanks in advance! Helen

tacazares commented 2 years ago

Hello,

I am trying to run maxatac variants and I have the following error:

maxatac variants: error: argument -variants_bed/--variants_bed: invalid int value: 'variants_1based.bed'

My variants_bed file looks like this:

head variants_1based.bed

chr1 758351 758351 G

chr1 770502 770502 A

chr1 770988 770988 G

chr1 785001 785001 T

chr1 785910 785910 C

chr1 787290 787290 C

I have tried supplying the SNPs in this bed file as 0-based or 1-based, but neither worked. I have also tried removing "chr". Please can you advise me where I might be going wrong?

Many thanks in advance!

Helen

Hello Helen,

Thanks so much for trying our method and posting this issue! I am sorry for your frustration having encountered this error. It looks like the argument parser for this function was set up wrong. It is not an issue with your input.

https://github.com/MiraldiLab/maxATAC/blob/bcc2eb68871ba1db27d3ebed2c20524311a86d02/maxatac/utilities/parser.py#L839

The fix will require me to change a few lines of code and test. I also noticed a few other arguments that had the wrong style assigned. I will work on it this evening and post an update for you. Thanks a bunch for pointing this out!

Tareian

tacazares commented 2 years ago

I was able to reproduce your error with my own data.

/Users/caz3so/opt/anaconda3/envs/maxatac/bin/python /Users/caz3so/workspaces/miraldiLab/maxATAC/maxatac/bin/maxatac variants -m /Users/caz3so/scratch/maxatac_test/predict_input/CTCF_20.h5 -s /Users/caz3so/scratch/databank/genome_inf/hg38/hg38.2bit -signal /Users/caz3so/scratch/maxatac_test/predict_input/GM12878__slop20bp_RP20M_minmax01.bw -variants_bed /Users/caz3so/scratch/20211012_maxATAC_atopicDerm_samples/data/variants/AD2_variants.bed -chroms chr20 -n CTCF_variants_test -o /Users/caz3so/scratch/maxatac_test/variants
                             _______       _____ 
                          /\|__   __|/\   / ____|
 _ __ ___   __ ___  __   /  \  | |  /  \ | |     
| '_ ` _ \ / _` \ \/ /  / /\ \ | | / /\ \| |     
| | | | | | (_| |>  <  / ____ \| |/ ____ \ |____ 
|_| |_| |_|\__,_/_/\_\/_/    \_\_/_/    \_\_____|

usage: maxatac variants [-h] -m MODEL -signal INPUT_BIGWIG [-o OUTPUT] -n NAME
                        [-s SEQUENCE] [-chroms CHROMOSOMES] -variants_bed
                        VARIANTS_BED [-roi ROI]
                        [--loglevel {fatal,error,warning,info,debug}]
                        [--blacklist BLACKLIST] [--chrom_sizes CHROM_SIZES]
                        [--step_size STEP_SIZE]
maxatac variants: error: argument -variants_bed/--variants_bed: invalid int value: '/Users/caz3so/scratch/20211012_maxATAC_atopicDerm_samples/data/variants/AD2_variants.bed'

Process finished with exit code 1

I updated the code to fix this problem.

/Users/caz3so/opt/anaconda3/envs/maxatac/bin/python /Users/caz3so/workspaces/miraldiLab/maxATAC/maxatac/bin/maxatac variants -m /Users/caz3so/scratch/maxatac_test/predict_input/CTCF_20.h5 -s /Users/caz3so/scratch/databank/genome_inf/hg38/hg38.2bit -signal /Users/caz3so/scratch/maxatac_test/predict_input/GM12878__slop20bp_RP20M_minmax01.bw -variants_bed /Users/caz3so/scratch/20211012_maxATAC_atopicDerm_samples/data/variants/AD2_variants.bed -chroms chr20 -n CTCF_variants_test -o /Users/caz3so/scratch/maxatac_test/variants
                             _______       _____ 
                          /\|__   __|/\   / ____|
 _ __ ___   __ ___  __   /  \  | |  /  \ | |     
| '_ ` _ \ / _` \ \/ /  / /\ \ | | / /\ \| |     
| | | | | | (_| |>  <  / ____ \| |/ ____ \ |____ 
|_| |_| |_|\__,_/_/\_\/_/    \_\_/_/    \_\_____|

[2022-04-19 23:01:26,816]
Create prediction regions
[2022-04-19 23:04:08,128]
Making sequence specific predictions for: /Users/caz3so/scratch/maxatac_test/predict_input/GM12878__slop20bp_RP20M_minmax01.bw 
Writing files with name: CTCF_variants_test 
Output bigwig: /Users/caz3so/scratch/maxatac_test/variants/CTCF_variants_test.bw 
Output bedgraph: /Users/caz3so/scratch/maxatac_test/variants/CTCF_variants_test.bg 
Prediction Windows: /Users/caz3so/scratch/maxatac_test/variants/CTCF_variants_test_windows.bed 
Sequence file: /Users/caz3so/scratch/databank/genome_inf/hg38/hg38.2bit 
MaxATAC model: /Users/caz3so/scratch/maxatac_test/predict_input/CTCF_20.h5 
Chromosome(s): ['chr20'] 
Variants Bed: /Users/caz3so/scratch/20211012_maxATAC_atopicDerm_samples/data/variants/AD2_variants.bed
[Errno 17] File exists: '/Users/caz3so/scratch/maxatac_test/variants'

Process finished with exit code 0

I found some additional problems that also needed corrected.

hrayjones commented 2 years ago

Hi Tareian,

Brilliant, thank you for your quick fix!

Helen

FaizRizvi commented 2 years ago

pypi package update to 1.0.4 https://pypi.org/project/maxatac/1.0.4/

hrayjones commented 2 years ago

Hi Tareian,

I updated maxatac to 1.0.4 but had a different error:

Screenshot 2022-04-22 at 11 02 28

I think I have fixed it by adding chrom_sizes to def import_roi_bed on line 152 of variant_tools.py and args.chrom_sizes to the call to import_roi_bed on line 27 of variants.py. It now seems to run OK.

Thank you for your help on this!

Helen

emiraldi commented 2 years ago

Thanks, Helen! Your reporting is very helpful to us!

FaizRizvi commented 2 years ago

Hi Helen,

Thank you for all your help! If you get a chance can you test the pypi package? (https://pypi.org/project/maxatac/1.0.5/). I have updated the changes on the pypi package.

Thank you, Faiz

hrayjones commented 2 years ago

Hi Faiz,

I have now tested the variants function of the pypi package version 1.0.5. It runs for me, and I get a resultant .bg file. However, I have not yet been able to generate the .bw file (with this version or previous versions). I've attached the error file here in case you can spot anything that is going wrong - I think it could be the runtime error? In the predict function I am able to obtain both the .bg and the .bw files.

Many thanks, Helen 5541708.pbs.ER.txt

FaizRizvi commented 2 years ago

Hi Helen,

Could you supply your exact inputs that caused this error to occur? I am noticing this in your error message: Chromosome(s): []

Best, Faiz

hrayjones commented 2 years ago

Hi Faiz,

Here was the command:

Screenshot 2022-05-16 at 15 30 24

In fact I've just noticed that I specified the chromosomes option twice... I wonder if that could have affected it?

The input bed files are here: chr22_variants.txt chr22_roi.txt

Thank you for looking into this! Helen

tacazares commented 2 years ago

Hello @hrayjones , It looks like the first issue has to do with the function for importing the ROI files. The function expects a 3 column BED file. Your input ROI.txt file has 4 columns. This caused the pybedtools package to import it incorrectly. I updated the function so that it would only import the first 3 columns of the ROI file.

There is also an issue with the input ROI.txt file that you are using. I have not completely worked out the nuances of this function, but there is an issue right now with generating the prediction windows using specific regions. I will talk to Faiz to come up with a solution to this annoying problem. Thanks for working through this with us! Tareian

hrayjones commented 2 years ago

Hi @tacazares,

OK, that makes sense then - thanks! Glad to be able to help in some way.

Best, Helen

tacazares commented 1 year ago

I was benchmarking peak-centric predictions and noticed there was unexpected poor performance. I found the error was how the bins were being created. The pybedtools windowmaker function will produce windows that are not all the same size. These windows are dropped if they are not 1,024 bp wide. If a peak is not 1,024 bp wide, it will window it to the peak size. If the peak is less than 1,024 bp, then it is being discarded. This will require updating the code for: https://github.com/MiraldiLab/maxATAC/blob/4ad3cdc3f4eab06ebbe490c5da70ba42a4d4d2a4/maxatac/utilities/prediction_tools.py#L81