deeptools / deepTools

Tools to process and analyze deep sequencing data.
Other
683 stars 211 forks source link

different sizes of .fasta files inputs changes output #1126

Open rotemavr opened 2 years ago

rotemavr commented 2 years ago

Hey, My name is Rotem from the Hebrew University, Israel. Im using bamcoverage and noticed in the [none.bw] output files that the reads are decimal numbers instead of integers, I would love to know the reason for that Second, does the size of a fasta file affect the count results? In more detail, I used two different input fasta files to run in Hisat2. 1- containing my genes of interest (paralogs) in fasta format and then run Hisat2 to obtain .bam files. Then I converted the .bam file to bigwig with bamcoverage, and I computed the read coverages with multiBamSummary. 2- containing the complete genome (including all the genes) in fasta format and then run Hisat2 to obtain .bam files. Then I converted the .bam file to bigwig with bam coverage, and I computed the read coverages with multiBamSummary and the BED-file option with my genes of interest coordinates. MY PROBLEM: I got different results for both approaches when the only difference is the size of the input fasta file (paralogs fasta and genome fasta).

dpryan79 commented 2 years ago

BigWig formats stores floating point values, not integers, so even without normalization values like 1 or 2 will still get stored as 1.0 and 2.0.

The size of the fasta file will affect the values if there's a normalization step involved. For example, RPGC (1x) normalization will result in different values for different fasta sizes assuming the number of aligned reads are identical.