Bam file has more than one references given

BDI-pathogens / phyloscanner

Phylogenetics between and within hosts at once, all along the genome.

GNU General Public License v3.0

47 stars 14 forks source link

Bam file has more than one references given #65

Closed simeonhebrew closed 1 year ago

simeonhebrew commented 3 years ago

Hello. Running the EstimateReadcountperwindow.py python script but the resulting error was that the first bam file had more than one references given and it just expected one. Any assistance will be highly appreciated. -Simeon Hebrew.

ChrisHIV commented 3 years ago

Hi Simon, all of the python/bioinformatic part of phyloscanner is designed & structured around the idea that each file of mapped reads has only one reference, because we consider it in sliding windows (the diagram on the homepage hopefully conveys the idea). Are you analysing sequence data from an organism with a fragmented genome / multiple chromosomes? If so, you will need to create a separate bam file for each disconnected part of the genome.

simeonhebrew commented 3 years ago

Yeah I think so, the datasets are Trypanosomal samples (Trypanosoma congolense) that were mapped against an assembled reference genome. What would creating separate bam files look like? Mapping against each chromosome? Thanks.

ChrisHIV commented 3 years ago

I've never worked with multi-chromosome / fragmented data so I'm not sure how exactly the mapping process worked, but I'm assuming the process results in a bam file in which each read from your original fastq files appears either exactly once (mapped to a single chromosome) or not at all (if it didn't look like any of your chromosomes, i.e. contamination). Is that right? It looks like the top two answers here would split such files by chromosome https://www.biostars.org/p/46327/