databio / pypiper

Python toolkit for building restartable pipelines
http://pypiper.databio.org
BSD 2-Clause "Simplified" License
45 stars 9 forks source link

Hard coded total chromosome sizes #19

Open nfortelny opened 7 years ago

nfortelny commented 7 years ago

Total chromosome sizes are hardcoded in the function "macs2CallPeaksATACSeq" and "macs2CallPeaks" of ngstk.py. So I ran into problems when I did the analysis with mm9. Maybe this could be added to the atacseq.yaml

Also, I wonder if those genome sizes are correct: For mm9, I summed up the chromosome size values from the chromosome_sizes files: /data/prod/ngs_resources/genomes/mm9/mm9_chromlength.txt The size i get exactly corresponds to this one: http://genomewiki.ucsc.edu/index.php/Genome_size_statistics

However, if I do the same for the other genomes (e.g. hg19) I do get 3.1e9 bases, which is similar to the link above but different from what's defined in ngstk.py.

afrendeiro commented 7 years ago

Those numbers are taken straight from here: https://github.com/taoliu/MACS I guess one could be more accurate, but I wouldn't think it is so critical.

nsheff commented 5 years ago

@afrendeiro what do you think about changing these to use refgenieconf? All we would need is a chrom_sizes asset, and then you would just use `refgenieconf.get_asset(genome, "chrom_sizes") to get the chromsizes file.

that way it works with any genome.