DiltheyLab / MetaMaps

Long-read metagenomic analysis
Other
96 stars 23 forks source link

Assertion failed: (abs(1 - f_sum) <= 1e-3 during classification step #65

Open jflot opened 2 years ago

jflot commented 2 years ago

Hello, I am trying to run MetaMaps to classify Nanopore reads that I know originate from a mix of four genomes. I built a custom database with these four (unpublished) genomes:

perl /Users/jff/miniconda3/bin/combineAndAnnotateReferences.pl --inputFileList listspecies.txt --outputFile combinedrefs.fa --taxonomyInDirectory /Users/jff/References/NCBItaxonomy --taxonomyOutDirectory combinedtaxonomy

Input: 4 genomes and 4 taxon IDs.
Reading taxonomy from /Users/jff/References/NCBItaxonomy ..
    done.

Introduced 0 new taxonomic IDs
Annotated 3320 contigs

Output new taxonomy into combinedtaxonomy

(base) adoxa:metamaps jff$ perl buildDB.pl --DB database --FASTAs combinedrefs.fa --taxonomy combinedtaxonomy 
Can't open perl script "buildDB.pl": No such file or directory
(base) adoxa:metamaps jff$ perl /Users/jff/miniconda3/bin/buildDB.pl --DB database --FASTAs combinedrefs.fa --taxonomy combinedtaxonomy 

Number of found FASTA input files: 1

Reading taxonomy from combinedtaxonomy ..
    done.

Running count keys in gene-2-protein: 0
$VAR1 = [
          'Annotation files',
          0
        ];
$VAR2 = [
          'Protein files',
          0
        ];
Reduced taxonomy in database/taxonomy from 2416811 nodes to 31 

Produced randomized-order database sequence file database/DB.fa and
    taxon info file database/taxonInfo.txt (4 taxa).
scalar((keys %gene_2_protein)): 0
    Total protein annotations: 0 -- with sequence: 0

then

perl /Users/jff/miniconda3/bin/buildDB.pl --DB database --FASTAs combinedrefs.fa --taxonomy combinedtaxonomy 

Number of found FASTA input files: 1

Reading taxonomy from combinedtaxonomy ..
    done.

Running count keys in gene-2-protein: 0
$VAR1 = [
          'Annotation files',
          0
        ];
$VAR2 = [
          'Protein files',
          0
        ];
Reduced taxonomy in database/taxonomy from 2416811 nodes to 31 

Produced randomized-order database sequence file database/DB.fa and
    taxon info file database/taxonInfo.txt (4 taxa).
scalar((keys %gene_2_protein)): 0
    Total protein annotations: 0 -- with sequence: 0

I did the mapping step:

metamaps mapDirectly --all -r database -q ../reads20000.fa -o classification_reads20000
>>>>>>>>>>>>>>>>>>
Reference = [database]
Query = [../reads20000.fa]
Kmer size = 16
Window size = 1000
Min. read length >= 1000
Alphabet = DNA
P-value = 0.001
Percentage identity threshold = 80
Report all = 1
Target max. memory = 0
Threads = 1
Mapping output file = classification_reads20000
>>>>>>>>>>>>>>>>>>
Parameters used:
    - alphabetSize: 4
    - kmerSize: 16
    - minReadLength: 1000
    - p_value: 0.001
    - percentageIdentity: 80
    - windowSize: 1000
    - maximumMemory: ~0 GB

Call storeCurrentState with 0

INFO, skch::Map::mapQuery, [count of mapped reads, reads qualified for mapping, total input reads] = [0, 54092, 54092]
INFO, skch::Map::Map, Time spent mapping the query : 57.7196 sec
INFO, skch::Sketch::build, minimizers picked from reference = 0

Index construction DONE, wrote 1 files.

but got an the following error message during the classification step:

metamaps classify --mappings classification_reads20000 --DB database
>>>>>>>>>>>>>>>>>>
DB = database
Mappings = classification_reads20000
Min. reads for 'U' = 10000
Threads = 1
>>>>>>>>>>>>>>>>>>
Read taxonomy from database/taxonomy -- have 31 nodes.
Starting EM...
Assertion failed: (abs(1 - f_sum) <= 1e-3), function doEM, file src/meta/fEM.h, line 500.
Abort trap: 6

It looks like none of my reads mapped on my reference genomes, but when I try mapping them using minimap2 for instance they almost all map on at least one of my four reference genomes... Thanks a lot in advance for your help.

AlexanderDilthey commented 12 months ago

Hi @jflot,

What happened here is that no reads were mapped, as suspected by you. I have added an informative error message.

What I find suspicious is the line INFO, skch::Sketch::build, minimizers picked from reference = 0 - are your reference contigs very short or very repetitive?

You could also look at the file classification_reads20000, which may be informative.

Best

Alex