broadinstitute / ABC-Enhancer-Gene-Prediction

Cell type specific enhancer-gene predictions using ABC model (Fulco, Nasser et al, Nature Genetics 2019)
MIT License
203 stars 62 forks source link

Using ABC with .bedpe files #216

Closed MattQIMR closed 6 months ago

MattQIMR commented 6 months ago

Hi,

I'm trying to run ABC using hic data in .bedpe format. As far as I can tell the data is formatted correctly and seems to work for chromosome 1 before throwing an error on chromosome 10. I've tried converting the data into the .avg format that you use for the supplied average hic data, and it seems to work then, and I can get it to run on some other hic data I have that is in .hic format.

Error file extract: Making predictions for chromosome: chr1 Making putative predictions table... Done. There are 1588854 putative enhancers for chromosome chr1 Elapsed time: 1.6780807971954346 Begin HiC Loading HiC bedpe HiC added to predictions table. Elapsed time: 0.6478121280670166 HiC Complete Completed chromosome: chr1. Elapsed time: 2.808246374130249

Making predictions for chromosome: chr10 Making putative predictions table... Done. There are 460450 putative enhancers for chromosome chr10 Elapsed time: 0.4352283477783203 Begin HiC Loading HiC bedpe Traceback (most recent call last): File "/mnt/lustre/working/l//Packages/ABC-Enhancer-Gene-Prediction/workflow/scripts/predict.py", line 303, in main() File "/mnt/lustre/working///Packages/ABC-Enhancer-Gene-Prediction/workflow/scripts/predict.py", line 242, in main this_chr = make_predictions( File "/mnt/lustre/working///Packages/ABC-Enhancer-Gene-Prediction/workflow/scripts/predictor.py", line 32, in make_predictions pred = add_hic_from_directory( File "/mnt/lustre/working///Packages/ABC-Enhancer-Gene-Prediction/workflow/scripts/predictor.py", line 217, in add_hic_from_directory genes_hic2[["gene_idx", "hic_idx"]], on="hic_idx" File "/mnt/backedup/home//.conda/envs/abc-env/lib/python3.10/site-packages/pandas/core/frame.py", line 3899, in getitem indexer = self.columns._get_indexer_strict(key, "columns")[1] File "/mnt/backedup/home//.conda/envs/abc-env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6115, in _get_indexer_strict self._raise_if_missing(keyarr, indexer, axis_name) File "/mnt/backedup/home//.conda/envs/abc-env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6176, in _raise_if_missing raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Index(['gene_idx', 'hic_idx'], dtype='object')] are in the [columns]" [Thu Apr 18 14:58:25 2024] Error in rule create_predictions: jobid: 7 input: /mnt/lustre/working///ABC/Predictions/ARK1_2/Neighborhoods/EnhancerList.txt, /mnt/lustre/working///ABC/Predictions/ARK1_2/Neighborhoods/GeneList.txt output: /mnt/lustre/working///ABC/Predictions/ARK1_2/Predictions/EnhancerPredictionsAllPutative.tsv.gz, /mnt/lustre/working///ABC/Predictions/ARK1_2/Predictions/EnhancerPredictionsAllPutativeNonExpressedGenes.tsv.gz conda-env: /mnt/lustre/working//***/Packages/ABC-Enhancer-Gene-Prediction/.snakemake/conda/21eeddbb5908ce12ca6bacc0f81637bf_ shell:

    python workflow/scripts/predict.py          --enhancers /mnt/lustre/working/***/***/ABC/Predictions/ARK1_2/Neighborhoods/EnhancerList.txt           --outdir /mnt/lustre/working/***/***/ABC/Predictions/ARK1_2/Predictions             --score_column ABC.Score            --chrom_sizes reference/hg38/GRCh38_EBV.no_alt.chrom.sizes.tsv          --accessibility_feature ATAC            --cellType ARK1_2           --genes /mnt/lustre/working/***/***/ABC/Predictions/ARK1_2/Neighborhoods/GeneList.txt           --hic_gamma 1.024238616787792           --hic_scale 5.9594510043736655          --hic_file /mnt/lustre/working/***/***/ABC/Input/ARK1_2 --hic_type bedpe --hic_resolution 5000          --scale_hic_using_powerlaw

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-04-18T145818.245833.snakemake.log

Head of chromosome 1 and chromosome 10 .bedpe files. (To be clear they are tab-delimited despite the way they are pasted here).

chr1 847499 847501 chr1 902499 902501 - 7.12844 chr1 912499 912501 chr1 917499 917501 - 2.99772 chr1 912499 912501 chr1 922499 922501 - 5.71457 chr1 912499 912501 chr1 927499 927501 - 3.96892 chr1 912499 912501 chr1 957499 957501 - 2.34114 chr1 912499 912501 chr1 1302499 1302501 - 4.06593 chr1 917499 917501 chr1 922499 922501 - 2.41381 chr1 917499 917501 chr1 942499 942501 - 2.71949 chr1 922499 922501 chr1 927499 927501 - 4.72495 chr1 922499 922501 chr1 942499 942501 - 10.9412

chr10 72499 72501 chr10 87499 87501 - 2.07629 chr10 82499 82501 chr10 87499 87501 - 11.7189 chr10 82499 82501 chr10 92499 92501 - 5.43199 chr10 82499 82501 chr10 132499 132501 - 2.31607 chr10 92499 92501 chr10 252499 252501 - 5.40662 chr10 102499 102501 chr10 107499 107501 - 2.02914 chr10 162499 162501 chr10 387499 387501 - 3.11684 chr10 222499 222501 chr10 327499 327501 - 2.10858 chr10 232499 232501 chr10 237499 237501 - 2.29642 chr10 262499 262501 chr10 282499 282501 - 2.18784