NCBI-Hackathons / NovoGraph

NovoGraph: building whole genome graphs from long-read-based de novo assemblies
MIT License
44 stars 8 forks source link

Died at ../../../NovoGraph-1.0.0/scripts/BAM2ALIGNMENT.pl line 246. #38

Closed SAMtoBAM closed 4 years ago

SAMtoBAM commented 4 years ago

Hi there,

I have run the following minimap2 -a -x asm20 ../ref/S288C_reference_sequence_R64-2-1_20150113_renamed.fsa ../homozygous_genomes/RENAMED/AAB.FINAL.canu_RENAMED.fa | samtools view -Sb - > AAB.bam Mapping one genome to a reference I checked and all contigs map When I run the next step: perl ../../../NovoGraph-1.0.0/scripts/BAM2ALIGNMENT.pl --BAM AAB.bam --referenceFasta ../ref/S288C_reference_sequence_R64-2-1_20150113_renamed.fsa --readsFasta ../homozygous_genomes/RENAMED/AAB.FINAL.canu_RENAMED.fa --outputFile intermediate_files/AlignmentInput.txt I get Died at ../../../NovoGraph-1.0.0/scripts/BAM2ALIGNMENT.pl line 246.

Any idea what the problem is?

Thanks Samuel

evanbiederstedt commented 4 years ago

Hi Samuel

It's a somewhat weird error, as there should be some sort of informative messages for most die if checks.

https://github.com/NCBI-Hackathons/NovoGraph/blob/master/scripts/BAM2ALIGNMENT.pl#L236-L252

unless($alignment_reference_noGaps eq $supposed_reference_sequence)
{
    print "Reference mismatch for read $readID\n";
    #print "\t", "CIGAR: ", $inputAlignment->cigar_str, "\n";
    # print "\t", "Softclip remove: ", join("\t", $remove_softclipping_front, $remove_softclipping_back), "\n";         
    #print "\t", $alignment_reference, "\n";
    #print "\t", $alignment_read, "\n";
    print "\t", "firstPos_reference_0based: ", $firstPos_reference_0based, "\n";
    print "\t", "REF: ", substr($alignment_reference_noGaps, 0, 20), " .. ", substr($alignment_reference_noGaps, length($alignment_reference_noGaps)-10, 10), " ", length($alignment_reference_noGaps),  "\n";
    print "\t", "REF2: ", substr($reference_href->{$chromosome}, $firstPos_reference_0based, 20),  " ", length($supposed_reference_sequence), "\n";
    #print "\t", "ALG: ", $supposed_reference_sequence, "\n";
    print "\t", "strand: ", $strand, "\n";          
    print "\n";
    $warnings++;
    die if($warnings > 0);
    return undef;
}

From what I see, line 246 is commented out. Do you have any more information (or data) to help debug? You also might want to print the variable $warnings to see if that returns something. If so, that could help you fix the issue.

SAMtoBAM commented 4 years ago

Looks like the issue is with downloading the v1.0 tar.gz! Also perhaps why it was previously asking for Bio::DB::HTS..

Running now with the latest scripts, appears to be working

Thanks

evanbiederstedt commented 4 years ago

Glad this was solved.

In principle, the errors in the scripts are meant to give users some clue as to how to debug the issue, if there are issues. I didn't know how to address this error though, so I'm glad it was traced back to an incomplete download.

Thanks, Evan