Closed NicoleYY77 closed 7 months ago
can you share your entire config file? One thing I can't tell is if you had changed the genome size parameter mentioned here: https://abc-enhancer-gene-prediction.readthedocs.io/en/latest/usage/getting_started.html#genome-builds
can you share your entire config file? One thing I can't tell is if you had changed the genome size parameter mentioned here: https://abc-enhancer-gene-prediction.readthedocs.io/en/latest/usage/getting_started.html#genome-builds
Thank you! Here's my entire config file:
`# working directory is ABC-Enhancer-Gene-Prediction
biosamplesTable: "config/config_K562.tsv" # replace with your own config/config_biosamples.tsv
results_dir: "results/"
ref: chrom_sizes: "reference/hg19/chr_sizes" regions_blocklist: "reference/hg19/wgEncodeHg19ConsensusSignalArtifactRegions.bed" ubiquitous_genes: "reference/UbiquitouslyExpressedGenes.txt" genes: "reference/hg19/hg19_ensembl.bed" genome_tss: "reference/hg19/hg19_ensembl_TSS.bed" qnorm: "reference/EnhancersQNormRef.K562.txt" abc_thresholds: "reference/abc_thresholds.tsv"
params_macs: pval: 0.1 genome_size: hs # shortcuts: hs: human ; mm: mice threads: 10
params_candidate: peakExtendFromSummit: 250 nStrongestPeaks: 150000
params_neighborhoods: use_qnorm: True
params_predict: flags: "--scale_hic_using_powerlaw" hic_gamma: 1.024238616787792 # avg hic gamma hic_scale: 5.9594510043736655 # avg hic scale hic_pseudocount_distance: 5000 # powerlaw at this distance is added to the contact for all predictions
params_filter_predictions: score_column: 'ABC.Score' threshold: null # null => Automatic determination based on input include_self_promoter: True only_expressed_genes: False
ABC_DIR_PATH: ""`
The genome size parameter for hg19 is different?
Genome size should be fine. The repo wasn't as well supported for hg19, so some of the reference files were missing (i noticed you generated the ensemble versions of the gene TSS). I've updated the reference files for hg19 in the latest release. The 0 counts looks relevant. Not sure why you're getting that but from our end, i was able to run ABC with the following DNase seq on hg19 without any issues: https://www.encodeproject.org/files/ENCFF001DOX/.
Genome size should be fine. The repo wasn't as well supported for hg19, so some of the reference files were missing (i noticed you generated the ensemble versions of the gene TSS). I've updated the reference files for hg19 in the latest release. The 0 counts looks relevant. Not sure why you're getting that but from our end, i was able to run ABC with the following DNase seq on hg19 without any issues: https://www.encodeproject.org/files/ENCFF001DOX/.
Thanks for the update! I didn't figure out why the 0 counts happened, but I merged the 2replicates of the Dnase-seq files (wgEncodeUwDnaseK562AlnRep1 &2.bam, as shown in supplementary table4) by samtools and that fixed the problem. However, after running all the code, the candidate enhancers are 151,894, not the same as 162,181 in the paper... So I'm not sure whether merging the replicate bam files is correct...
I also tried running the following Benchmark analysis following this: https://github.com/EngreitzLab/CRISPR_comparison/blob/main/README.md, where I used the data from Supplementary table6a as the experimental dataset for comparison with the prediction, but couldn't replicate the results in the paper... Did I miss any steps here or understand anything incorrectly?
If your goal is to replicate the results exactly, you'd have to use the older version of the repo: https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction/tree/master. Just beware that you'll have to debug any issues on your own as we no longer support that version.
If your goal is to replicate the results exactly, you'd have to use the older version of the repo: https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction/tree/master. Just beware that you'll have to debug any issues on your own as we no longer support that version.
Thank you!
Hi, I'm trying to use the public data as mentioned in the supplementary file: H3K27ac-ChIP: ENCFF384ZZM and DHS: wgEncodeUwDnaseK562AlnRep1.bam, wgEncodeUwDnaseK562AlnRep2.bam. And I also change the reference file to hg19 before running snakemake -j1.
Feature H3K27ac completed but I got error with DHS:
Generating /lustre03/project/6007512/nicoley/EPI_prediction/ABCmodel/ABC-Enhancer-Gene-Prediction/results/K562_hg19/Neighborhoods/Genes.DHS.wgEncodeUwDnaseK562AlnRep1.bam.CountReads.bedgraph Counting coverage for Genes.DHS.wgEncodeUwDnaseK562AlnRep1.bam Running command: awk 'BEGIN {FS=OFS="\t"} (substr($1, length($1)) == "X" || substr($1, length($1)) == "Y") { $4 *= 2 } 1' /lustre03/project/6007512/nicoley/EPI_prediction/ABCmodel/ABC-Enhancer-Gene-Prediction/results/K562_hg19/Neighborhoods/Genes.DHS.wgEncodeUwDnaseK562AlnRep1.bam.CountReads.bedgraph > /lustre03/project/6007512/nicoley/EPI_prediction/ABCmodel/ABC-Enhancer-Gene-Prediction/results/K562_hg19/Neighborhoods/Genes.DHS.wgEncodeUwDnaseK562AlnRep1.bam.CountReads.bedgraph.tmp && mv /lustre03/project/6007512/nicoley/EPI_prediction/ABCmodel/ABC-Enhancer-Gene-Prediction/results/K562_hg19/Neighborhoods/Genes.DHS.wgEncodeUwDnaseK562AlnRep1.bam.CountReads.bedgraph.tmp /lustre03/project/6007512/nicoley/EPI_prediction/ABCmodel/ABC-Enhancer-Gene-Prediction/results/K562_hg19/Neighborhoods/Genes.DHS.wgEncodeUwDnaseK562AlnRep1.bam.CountReads.bedgraph Traceback (most recent call last): File "/lustre03/project/6007512/nicoley/EPI_prediction/ABCmodel/ABC-Enhancer-Gene-Prediction/workflow/scripts/run.neighborhoods.py", line 209, in <module> main(args) File "/lustre03/project/6007512/nicoley/EPI_prediction/ABCmodel/ABC-Enhancer-Gene-Prediction/workflow/scripts/run.neighborhoods.py", line 204, in main processCellType(args) File "/lustre03/project/6007512/nicoley/EPI_prediction/ABCmodel/ABC-Enhancer-Gene-Prediction/workflow/scripts/run.neighborhoods.py", line 171, in processCellType annotate_genes_with_features( File "/lustre03/project/6007512/nicoley/EPI_prediction/ABCmodel/ABC-Enhancer-Gene-Prediction/workflow/scripts/neighborhoods.py", line 126, in annotate_genes_with_features genes = count_features_for_bed( ^^^^^^^^^^^^^^^^^^^^^^^ File "/lustre03/project/6007512/nicoley/EPI_prediction/ABCmodel/ABC-Enhancer-Gene-Prediction/workflow/scripts/neighborhoods.py", line 512, in count_features_for_bed df = count_single_feature_for_bed( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/lustre03/project/6007512/nicoley/EPI_prediction/ABCmodel/ABC-Enhancer-Gene-Prediction/workflow/scripts/neighborhoods.py", line 567, in count_single_feature_for_bed total_counts = count_total(feature_bam) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/lustre03/project/6007512/nicoley/EPI_prediction/ABCmodel/ABC-Enhancer-Gene-Prediction/workflow/scripts/neighborhoods.py", line 703, in count_total total_counts = count_bam_mapped(infile) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/lustre03/project/6007512/nicoley/EPI_prediction/ABCmodel/ABC-Enhancer-Gene-Prediction/workflow/scripts/neighborhoods.py", line 670, in count_bam_mapped raise ValueError("Error counting BAM file: count <= 0") ValueError: Error counting BAM file: count <= 0
I checked macs2_peaks.narrowPeak.sorted.averageAccessibility.Counts.bed file and it does have lots of 0 counts, I'm confused that why this happen? the examples run successfully, did I miss any steps? and how should I fix this problem? Really appreciate for any help!!