NCBI-Hackathons / NovoGraph

NovoGraph: building whole genome graphs from long-read-based de novo assemblies
MIT License
44 stars 8 forks source link

sam2alignment aborted #36

Closed Boer223 closed 4 years ago

Boer223 commented 4 years ago

Hi,

When I run the Step 1 to get the AlignmentInput.txt file, it occurs with the following error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Missing primary non-supplementary alignment for read 000351F_pilon_pilon_pilon
sh: line 1: 22461 Aborted                 (core dumped) /home/cuixb/tools/biosoft/NovoGraph-1.0.0/src/sam2alignment all.genomes.filtered.bam.extract.filtered.SAM westar.fa > all.genomes.filtered.bam.extract.alignments
Cannot execute command: /home/cuixb/tools/biosoft/NovoGraph-1.0.0/src/sam2alignment all.genomes.filtered.bam.extract.filtered.SAM westar.fa > all.genomes.filtered.bam.extract.alignments at ./BAM2ALIGNMENT.pl line 67.

Before I run this step, I also filter the bam file with

samtools view -F 0x4 
samtools view -q 30 -F 256

So, how to solve this error?

Thank you in advance!

evanbiederstedt commented 4 years ago

Hi @Chipcui

So, you're first filtering unmapped reads in the BAM with

samtools view -F 0x4 -bo allContigs_unsorted.filtered.bam allContigs_unsorted.bam

Have you run the following command? I would expect that you would run into issues here as well---the errors are meant to give the user some sense if there are QC problems with the inputs:

perl checkBAM_SVs_and_INDELs.pl --BAM allContigs_unsorted.bam
                                --referenceFasta GRCh38_full_plus_hs38d1_analysis_set_minus_alts.fa 
                                --readsFasta allContigs.fa
                   --sam2alignment_executable ../src/sam2alignment

In any sense, the error Missing primary non-supplementary alignment for read 000351F_pilon_pilon_pilon is from here: https://github.com/NCBI-Hackathons/NovoGraph/blob/master/src/sam2alignment.cpp#L569-L580

Note that 000351F_pilon_pilon_pilon is the read ID which was extracted from your FASTA westar.fa here: https://github.com/NCBI-Hackathons/NovoGraph/blob/master/src/sam2alignment.cpp#L181

That FASTA westar.fa should have been the FASTA you used to create the BAM.

So, as the error says, it looks like a primary alignment should be there, but it's not. (There are many such catches within sam2alignment.cpp to inform the user that there are problems with the inputs.)

I expect the error could be due to the filter samtools view -q 30 -F 256, which means you're filtering out all secondary alignments and skipping alignments with MAPQ < 30. I'm not sure why that filter would be necessary.

Boer223 commented 4 years ago

Thank you @evanbiederstedt

As you suggested, I have run

perl /home/cuixb/tools/biosoft/NovoGraph-1.0.0/scripts/checkBAM_SVs_and_INDELs.pl --BAM all.genomes.filtered.bam --referenceFasta westar.fa --readsFasta all.genomes.fa --sam2alignment_executable ~/tools/biosoft/NovoGraph-1.0.0/src/sam2alignment

to check the bam file, but it occurs the error again.

Read westar.fa
        done.
Read all.genomes.fa
        done.
terminate called after throwing an instance of 'std::runtime_error'
  what():  Missing primary non-supplementary alignment for read 001834R_pilon_pilon_pilon
sh: line 1:  2913 Aborted                 (core dumped) /home/cuixb/tools/biosoft/NovoGraph-1.0.0/src/sam2alignment all.genomes.filtered.bam.extract.SAM westar.fa > all.genomes.filtered.bam.extract.alignments
Cannot execute command: /home/cuixb/tools/biosoft/NovoGraph-1.0.0/src/sam2alignment all.genomes.filtered.bam.extract.SAM westar.fa > all.genomes.filtered.bam.extract.alignments at /home/cuixb/tools/biosoft/NovoGraph-1.0.0/scripts/checkBAM_SVs_and_INDELs.pl line 84.
evanbiederstedt commented 4 years ago

@Chipcui How did you create the BAM?

 Missing primary non-supplementary alignment for read 001834R_pilon_pilon_pilon

Note that this error is a different read ID, 001834R_pilon_pilon_pilon. These are identifiers from your FASTA file (which should be the FASTA you used to create the BAM). These are primary alignments which should exist in your BAM. You could try to find them with

samtools view input.bam | grep "001834R_pilon_pilon_pilon"

or similar commands. Trying the suggestCommands.pl command may be useful here as well.

Boer223 commented 4 years ago

Hi,

I created the bam of each genome fasta file and then merged all the bam files together, the command was like this:

minimap2 --paf-no-hit -a -x asm5 --cs -r2k westar.fa genome.fa | samtools view -bhS - | samtools sort -o genome.sorted.bam - 
evanbiederstedt commented 4 years ago

Hi @Chipcui

It would be helpful if you output:

samtools view input.bam | grep "001834R_pilon_pilon_pilon"

There's something very strange here, as the error is suggesting for multiple reads there is no primary alignment, i.e. the read "001834R_pilon_pilon_pilon" and the read"000351F_pilon_pilon_pilon"

RE: minimap command

Could you detail the flags you're using with minimap2 somewhat? I'm still a bit confused by:

--paf-no-hit
    In PAF, output unmapped queries; the strand and the reference name fields are set to ‘*’. 
    Warning: some paftools.js commands may not work with such output for the moment.
Boer223 commented 4 years ago

Hi,

As you suggested, I use samtools view all.genomes.filtered.bam | grep "001834R_pilon_pilon_pilon" >error.sam to output the "001834R_pilon_pilon_pilon" read alignment, but I think there is non error with the read alignment. Here is the output sam file:

error.txt

evanbiederstedt commented 4 years ago

Hi @Chipcui

Sorry for the delay.

I'm still a bit confused by what you've sent----it's weird to have reads with supplementary alignments, but no primary alignments.

Could you please try two things: (1) The minimap2 command you used is the following:

minimap2 --paf-no-hit -a -x asm5 --cs -r2k westar.fa genome.fa | samtools view -bhS - | samtools sort -o genome.sorted.bam - 

Why did you use this? Could you detail this a bit more?

(2) Could you try the exact command we have in the README, and check if the error persists?

## Align the contigs FASTA against the reference, outputting a single BAM
minimap2 -a -x asm20 GRCh38_full_plus_hs38d1_analysis_set_minus_alts allContigs.fa  | samtools view -Sb - > allContigs_unsorted.bam

Thanks, Evan

evanbiederstedt commented 4 years ago

Hi @Chipcui

Were you able to figure out the problem?

Best, Evan

Boer223 commented 4 years ago

I have tried the exact command in the novograph, but it also did not work. The error is same as above. Maybe my alignment process had some problem.

evanbiederstedt commented 4 years ago

Could you share the FASTA header here? This is very strange.

Boer223 commented 4 years ago

You mean the each fasta sequence name? Now our server has some problem to connect, I'll share them later.

TorHou commented 4 years ago

It sounds to me like a problem I faced at some point too. When you map the fastas seperately and combine the *bams after the fact you have to make sure, that you combine the fastas too and provide the combined when you call BAM2ALIGNMENT.pl I suspect in the FASTA-ids 000351F_pilon_pilon_pilon will be missing.

evanbiederstedt commented 4 years ago

I guess we should clarify this in the README so that users don't become confused.

TorHou commented 4 years ago

I was mistaken. The error that I receive when I run sam2alignment on a concatenated bam and a singular fasta file is:

[W::fai_get_val] Reference read1 not found in FASTA file, returning empty sequence   

So @Chipcui error seems to be different.

evanbiederstedt commented 4 years ago

Any update on this, @Chipcui ?