RGT-HINT genome files are incompatible with nf-core ATAC-seq pipeline.

Hello RGT-HINT, I met error messages when running RGT-HINT footprinting:

rgt-hint footprinting --organism mm10 --paired-end --output-location /mnt/e/HYJ/2022_7_23_ATAC-seq/results/RGT_HINT/ --output-prefix footprint_WT --atac-seq /mnt/e/HYJ/2022_7_23_ATAC-seq/results/bwa/mergedReplicate/control.mRp.clN.sorted.bam /mnt/e/HYJ/2022_7_23_ATAC-seq/results/bwa/mergedReplicate/macs/broadPeak/control.mRp.clN_peaks.broadPeak
Report: The scikit HMM encountered errors when applied. in region (10,52417320,52418086). This iteration will be skipped.

I'm using bam files generated by nf-core pipeline. They used reference genomes which were downloaded on July 17, 2015. I believe the above error was caused by the coordinate inconsistency between their reference genome file and the reference genome file that RGT-HINT configured from Encode vM25.

I think the way to solve this error is to replace the files under ~/rgtdata/mm10/ folder with the genome files nf-core pipeline used. The nf-core pipeline supplies genome.fa, genome.fai, and chrom.sizes files, so I can replace genome_mm10.fa, genome_mm10.fa.fai and chrom.sizes.mm10 under ~/rgtdata/mm10/ folder. I know I can download gencode.annotation.gtf file matching nf-core versions from Gencode, but where can I download genes_Gencode_mm10.bed and genes_RefSeq_mm10.bed? Is it necessary to also replace genes_Gencode_mm10.bed and genes_RefSeq_mm10.bed matching the versions with nf-core?

Thanks! Best, Yuanjian

CostaLab / reg-gen

RGT-HINT genome files are incompatible with nf-core ATAC-seq pipeline. #232