epigen / atacseq_pipeline

Ultimate ATAC-seq Data Processing, Quantification and Annotation Snakemake Workflow and MrBiomics Module.
https://epigen.github.io/atacseq_pipeline/
MIT License
47 stars 2 forks source link

Duplicated empty rows in poromoter_counts.csv #36

Closed sreichl closed 7 months ago

sreichl commented 9 months ago

in a human data set with 60 samples, the promoters_count.csv contained 44 duplicated rownames/ensembl ids with all zeros.

in a large mouse data set no duplicate features are found.

dariarom94 commented 8 months ago

I just checked - same genomic regions for duplicated ENSG

sreichl commented 7 months ago

@dariarom94 as I can not reproduce the error/bug: Could you check if the promoter_regions.bed file already has the duplicates? This would help me narrow down where the bug might be introduced.

sreichl commented 7 months ago

@dariarom94 nevermind, I reproduced it using hg38. =)

sreichl commented 7 months ago

the 44 genes are Pseudoautosomal regions in human that are shared between X & Y chromosome.