andersen-lab / Freyja

Depth-weighted De-Mixing
BSD 2-Clause "Simplified" License
100 stars 29 forks source link

Freyja plot #148

Closed mikal16 closed 1 year ago

mikal16 commented 1 year ago

Hi there,

I am trying to run freyja plot and I am getting this error.

Screen Shot 2023-06-07 at 10 46 42 AM

After going through my aggregated file, I realize it is because all my samples have a very low coverage of 3.5, which is why it won't run. However, it seems very odd to me that all my samples would have such low coverage. I followed the steps to generate the bam files for processing step by step, and was just wondering if you might have any insight on why I would be seeing such bad coverage.

Thanks very much in advance!

Mikal

mikal16 commented 1 year ago

Sorry, I should also mention that when I used samtools to calculate my coverage using a command such as this one: samtools depth -a bamfile | awk '{sum+=$3} END { print "Average = ",sum/NR}' my coverage seemed to look good for almost all my samples (attached).

coverage.txt

Thanks again,

Mikal

joshuailevy commented 1 year ago

Hi Mikal,

I think you're using a slightly different definition of coverage (unfortunately, there's a handful of different definitions out there). Freyja returns the fraction of sites with greater than 10 reads, but your method calculates the average number of reads per site (since sequencing depth is often highly non-uniform across sites, this readout can be a bit misleading).

Josh

mikal16 commented 1 year ago

Oh okay I see, I was also wondering, does Freyja calculate the coverages of all samples all together? Because all my samples have the exact same coverage. I'm wondering, because maybe I could get rid of the bad samples and still run the analysis that way.

Thanks again,

Mikal

joshuailevy commented 1 year ago

That's somewhat strange. Freyja should produce a separate coverage estimate for each sample. Are you doing whole genome sequencing? If you are doing spike only seq, this could potentially happen.

mikal16 commented 1 year ago

I'm doing whole genome sequencing. But I'm finding that for each folder with multiple samples that I'm processing, I'm getting the same coverage for every single sample, which is very odd.

Let me know if anything I could send could help troubleshoot this.

Thanks again,

Mikal

joshuailevy commented 1 year ago

I'd be happy to take a look at a few example bam files if you want to send them!

Josh

mikal16 commented 1 year ago

Awesome, yes, the weird thing is I ran this twice and the first time it worked not detecting any issues in quality, but the second time it was showing problems with the quality. It doesn't seem to let me attach bam files here, let me know if you'd like me to email a few.

Thanks again!

Mikal

joshuailevy commented 1 year ago

Ah, sorry. You'd have to save them with a txt extension to drop them here. You can also just send them to me at jolevy@scripps.edu

joshuailevy commented 1 year ago

Closing this. @mikal16 feel free to reach out if you have further questions!