broadinstitute / ABC-Enhancer-Gene-Prediction

Cell type specific enhancer-gene predictions using ABC model (Fulco, Nasser et al, Nature Genetics 2019)
MIT License
199 stars 59 forks source link

qnorm in config.yaml #192

Closed olechnwin closed 6 months ago

olechnwin commented 6 months ago

Hi,

I am trying to run this on my own dataset. I git cloned this repo yesterday and in the config.yamlthere is a qnorm file that wasn't mentioned in the methods.

How do I generate this file? I tried to set it to qnorm: ""but got this error: run.neighborhoods.py: error: argument --qnorm: expected one argument

Also, along the same line, is it possible to run ABC without regions_blocklist, ubiquitous_genes, and qnorm? I am doing this in danio rerio genome.

Thank you!

atancoder commented 6 months ago

You don't generate that file. You can read more about it here: https://abc-enhancer-gene-prediction.readthedocs.io/en/latest/usage/methods.html#quantile-normalization-for-activity

If you wish to disable qnorm, you can set the use_qnorm config value to False

olechnwin commented 6 months ago

Thank you! But what should I specify under ref in config.yaml ?

atancoder commented 6 months ago

You should leave that config variable as is

olechnwin commented 6 months ago

I followed your advice to leave the config.yaml reference files as is and set the use_qnorm to False:

### REFERENCE FILES
ref:
        chrom_sizes: "reference/danRer11/danRer11.chrom.sizes.tsv"
        regions_blocklist: "reference/danRer11/danRer11_blacklist.bed"
        ubiquitous_genes: "reference/UbiquitouslyExpressedGenes_zebrafish.txt"
        genes: "reference/danRer11/gene_list_danRer11.bed"
        genome_tss: "reference/danRer11/gene_list_danRer11_TSS500bp.bed"
        qnorm: "reference/EnhancersQNormRef.K562.txt"

# These parameters are used to run the Run Neighborhoods portion of the abc code
params_neighborhoods:
        use_qnorm: False

But I got these two error messages:

ERROR: Received illegal bin number 4294967295 from getBin call.
ERROR: Unable to add record to tree.
Running command: awk 'BEGIN {FS=OFS="\t"} (substr($1, length($1)) == "X" || substr($1, length($1)) == "Y") { $4 *= 2
 } 1' /home/analysis/ABC-results-noQnorm/P3F/Peaks/macs2_peaks.narrowPeak.so
rted.merged_sorted.bam.Counts.bed > /home/analysis/ABC-results-noQnorm/P3F/P
eaks/macs2_peaks.narrowPeak.sorted.merged_sorted.bam.Counts.bed.tmp && mv /analysis/ABC-results-noQnorm/P3F/Peaks/macs2_peaks.narrowPeak.sorted.merged_sorted.bam.Counts.bed.tmp /analysis/ABC-results-noQnorm/P3F/Peaks/macs2_peaks.narrowPeak.sorted.merged_sort
ed.bam.Counts.bed
Running piped cmds: ['bedtools sort -i /home/analysis/ABC-results-noQnorm/P3F/Peaks/macs2_peaks.narrowPeak.sorted.merged_sorted.bam.Counts.bed -faidx reference/danRer11/danRer11.chrom.sizes.ts
v', 'bedtools merge -i stdin -c 4 -o max', 'sort -nr -k 4', 'head -n 150000', 'bedtools intersect -b stdin -a /home/
/analysis/ABC-results-noQnorm/P3F/Peaks/macs2_peaks.narrowPeak.sorted -wa', '
awk \'{{print $1 "\\t" $2 + $10 "\\t" $2 + $10}}\'', 'bedtools slop -i stdin -b 250 -g reference/danRer11/danRer11.c
hrom.sizes.tsv', 'bedtools sort -i stdin -faidx reference/danRer11/danRer11.chrom.sizes.tsv', 'bedtools merge -i std
in', 'bedtools intersect -v -wa -a stdin -b reference/danRer11/danRer11_blacklist.bed', 'cut -f 1-3', '(bedtools int
ersect -a reference/danRer11/gene_list_danRer11_TSS500bp.bed -b /home/analys
is/ABC-results-noQnorm/tmp/reference/danRer11/danRer11.chrom.sizes.tsv.bed -wa | cut -f 1-3 && cat)', 'bedtools sort
 -i stdin -faidx reference/danRer11/danRer11.chrom.sizes.tsv', 'bedtools merge -i stdin > /home/
Genevieve/analysis/ABC-results-noQnorm/P3F/Peaks/macs2_peaks.narrowPeak.sorted.candidateRegions.bed']
[Mon Mar  4 09:52:57 2024]
Finished job 3.
4 of 9 steps (44%) done
Select jobs to execute...

and another one many lines below the first one:

Feature ATAC completed in 11.736462116241455
Assigning classes to enhancers
Total enhancers: 12877
         Promoters: 12877
         Genic: 0
         Intergenic: 0
Traceback (most recent call last):
  File "/gpfs0/home2/opt/ABC-Enhancer-Gene-Prediction/workflow/scripts/run.neighborhoods.py", line 209, in <module>
    main(args)
  File "/gpfs0/home2/opt/ABC-Enhancer-Gene-Prediction/workflow/scripts/run.neighborhoods.py", line 204, in main
    processCellType(args)
  File "/gpfs0/home2/opt/ABC-Enhancer-Gene-Prediction/workflow/scripts/run.neighborhoods.py", line 183, in processCellType
    load_enhancers(
  File "/gpfs0/home2/opt/ABC-Enhancer-Gene-Prediction/workflow/scripts/neighborhoods.py", line 303, in load_enhancers
    enhancers = run_qnorm(enhancers, qnorm)
  File "/gpfs0/home2/opt/ABC-Enhancer-Gene-Prediction/workflow/scripts/neighborhoods.py", line 798, in run_qnorm
    qnorm = pd.read_csv(qnorm, sep="\t")
  File "/home/opt/miniconda3/envs/mamba_env/envs/abc-env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/opt/miniconda3/envs/mamba_env/envs/abc-env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 620, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/opt/miniconda3/envs/mamba_env/envs/abc-env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/home/opt/miniconda3/envs/mamba_env/envs/abc-env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1880, in _make_engine
    self.handles = get_handle(
  File "/home/opt/miniconda3/envs/mamba_env/envs/abc-env/lib/python3.10/site-packages/pandas/io/common.py", line 873, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'None'

How do I fix this? Also, why would the Enhancer list only contain promoters?

Thank you!

atancoder commented 6 months ago

The last error looks like the value of qnorm is None. Are you sure the qnorm argument is being passed in correctly?

I'm not sure why the enhancer list only contains promoters. I don't think it's like that for the chr22 example we provide? So probably something to do with your inputs or the way you changed the config. When replacing the reference files, make sure your files are following a similar format

olechnwin commented 6 months ago

I really appreciate your quick replies to help me run this tool on my datasets.

How do I make sure the qnorm argument is being passed in correctly? I've set it to False in the config.yaml but you're right it became --qnorm None in the run_neighbohoods.py command.

I'm attaching my log files, config.yaml and config_biosamples below:

run_abc_enhancer.txt config.yaml.txt config_p3f_rerun1.tsv.txt

olechnwin commented 6 months ago

Isn't the qnorm changed to None when it's not True in the config according to neighborhoods.smk line 13 ?

atancoder commented 6 months ago

Hmm, qnorm getting passed as None is expected then. But the code is supposed to handle None qnorm values: https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction/blob/v1.0.0/workflow/scripts/neighborhoods.py#L790.

I wonder if None is getting interpreted as a string instead. I'll have to look into this further

olechnwin commented 6 months ago

Hmm, qnorm getting passed as None is expected then. But the code is supposed to handle None qnorm values: https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction/blob/v1.0.0/workflow/scripts/neighborhoods.py#L790.

I wonder if None is getting interpreted as a string instead. I'll have to look into this further

Yes! I saw that there is a condition for qnorm None. Thank you so much for looking into this. Now, I also learnt how to refer to a specific line :-). Thank you!!

olechnwin commented 6 months ago

Thanks for fixing the no qnorm, but I think something else is broken now? I have this error using my files or your example file:

+ snakemake -n -p
KeyError in file /opt/ABC-Enhancer-Gene-Prediction/workflow/Snakefile, line 17:
'results_dir'
  File "/opt/ABC-Enhancer-Gene-Prediction/workflow/Snakefile", line 17, in <module>
+ snakemake -j1
KeyError in file /opt/ABC-Enhancer-Gene-Prediction/workflow/Snakefile, line 17:
'results_dir'
  File "/opt/ABC-Enhancer-Gene-Prediction/workflow/Snakefile", line 17, in <module>
olechnwin commented 5 months ago

Thanks for fixing the no qnorm, but I think something else is broken now? I have this error using my files or your example file:

+ snakemake -n -p
KeyError in file /opt/ABC-Enhancer-Gene-Prediction/workflow/Snakefile, line 17:
'results_dir'
  File "/opt/ABC-Enhancer-Gene-Prediction/workflow/Snakefile", line 17, in <module>
+ snakemake -j1
KeyError in file /opt/ABC-Enhancer-Gene-Prediction/workflow/Snakefile, line 17:
'results_dir'
  File "/opt/ABC-Enhancer-Gene-Prediction/workflow/Snakefile", line 17, in <module>

I'm just going to put this up for people like me. This particular error was because I was using old version of example files which was not compatible with the latest version of code.