DiltheyLab / MetaMaps

Long-read metagenomic analysis
Other
98 stars 23 forks source link

Assertion 'combinedOutput.is_open()' failed #11

Closed aroelo closed 5 years ago

aroelo commented 5 years ago

Hi Alexander,

I just ran into an issue when using mapDirectly with my custom database.

The last lines of output are:

Added C3600|kraken:taxid|185218|NC_012519.1 with length 7781; est. memory ~24.2109 GB
Added C5076|kraken:taxid|389469|NC_008717.1 with length 36415; est. memory ~24.2112 GB
Added C4950|kraken:taxid|673515|NC_021069.1 with length 10865; est. memory ~24.2113 GB
Added C959|kraken:taxid|170617|NC_005038.1 with length 99657; est. memory ~24.2123 GB
Added C4768|kraken:taxid|1746063|NC_028391.1 with length 1940; est. memory ~24.2123 GB
Added C5424|kraken:taxid|1688637|NC_027706.1 with length 9829; est. memory ~24.2124 GB
Added C2200|kraken:taxid|10815|NC_001928.2 with length 2632; est. memory ~24.2125 GB
Added kraken:taxid|4232|C1004|NW_019010654.1 with length 9845; est. memory ~24.2125 GB
Added C2630|kraken:taxid|223307|NC_002049.1 with length 2542; est. memory ~24.2126 GB
Added C5772|kraken:taxid|71186|NC_009605.1 with length 2562; est. memory ~24.2126 GB
Added C640|kraken:taxid|201862|NC_007216.1 with length 9695; est. memory ~24.2127 GB

Call storeCurrentState with 7338

INFO, skch::Sketch::computeFreqHist, Frequency histogram of minimizers = (1, 70758415) ... (242454, 1)
INFO, skch::Sketch::computeFreqHist, With threshold 0.001%, ignore minimizers occurring >= 7400 times during lookup.
INFO, skch::Map::mapQuery, [count of mapped reads, reads qualified for mapping, total input reads] = [160689, 401370, 1705578]
INFO, skch::Map::Map, Time spent mapping the query : 492.806 sec
INFO, skch::Sketch::build, minimizers picked from reference = 685249698

Index construction DONE, wrote 1 files.

metamaps: src/map/mapWrap.h:40: void mapWrap::unifyFiles(std::__cxx11::string, const skch::Parameters&, std::vector<std::__cxx11::basic_string<char> >, std::vector<std::__cxx11::basic_string<char> >): Assertion `combinedOutput.is_open()' failed.

After running the following command:

/usr/bin/time -v /opt/MetaMaps-master/metamaps mapDirectly -t 20 --all -r ./myDB/DB.fa -q /datadrivehdd1/nanopore/03_projects/campanula/reads/campanula_PCS108_a2.3.0.fastq -o ./classification_results --maxmemory 100 &> output.txt

I have added the corresponding output.txt file if you would like to see the full output. output.txt

Only a 'classification_results.1' file is created, and as I read earlier this means that classify didn't finish completely.

AlexanderDilthey commented 5 years ago

Hi,

Hmm, this is weird - it seems the program can’t open the main output file. My initial guess was file system/permission issues, but given that the .1 file was created, this seems unlikely. Have you tried writing into another directory? All basic sanity checks, like space on disk, ok?

Get Outlook for iOShttps://aka.ms/o0ukef


From: aroelo notifications@github.com Sent: Wednesday, January 30, 2019 08:47 To: DiltheyLab/MetaMaps Cc: Subscribed Subject: [DiltheyLab/MetaMaps] Assertion 'combinedOutput.is_open()' failed (#11)

Hi Alexander,

I just ran into an issue when using mapDirectly with my custom database.

The last lines of output are:

Added C3600|kraken:taxid|185218|NC_012519.1 with length 7781; est. memory ~24.2109 GB Added C5076|kraken:taxid|389469|NC_008717.1 with length 36415; est. memory ~24.2112 GB Added C4950|kraken:taxid|673515|NC_021069.1 with length 10865; est. memory ~24.2113 GB Added C959|kraken:taxid|170617|NC_005038.1 with length 99657; est. memory ~24.2123 GB Added C4768|kraken:taxid|1746063|NC_028391.1 with length 1940; est. memory ~24.2123 GB Added C5424|kraken:taxid|1688637|NC_027706.1 with length 9829; est. memory ~24.2124 GB Added C2200|kraken:taxid|10815|NC_001928.2 with length 2632; est. memory ~24.2125 GB Added kraken:taxid|4232|C1004|NW_019010654.1 with length 9845; est. memory ~24.2125 GB Added C2630|kraken:taxid|223307|NC_002049.1 with length 2542; est. memory ~24.2126 GB Added C5772|kraken:taxid|71186|NC_009605.1 with length 2562; est. memory ~24.2126 GB Added C640|kraken:taxid|201862|NC_007216.1 with length 9695; est. memory ~24.2127 GB

Call storeCurrentState with 7338

INFO, skch::Sketch::computeFreqHist, Frequency histogram of minimizers = (1, 70758415) ... (242454, 1) INFO, skch::Sketch::computeFreqHist, With threshold 0.001%, ignore minimizers occurring >= 7400 times during lookup. INFO, skch::Map::mapQuery, [count of mapped reads, reads qualified for mapping, total input reads] = [160689, 401370, 1705578] INFO, skch::Map::Map, Time spent mapping the query : 492.806 sec INFO, skch::Sketch::build, minimizers picked from reference = 685249698

Index construction DONE, wrote 1 files.

metamaps: src/map/mapWrap.h:40: void mapWrap::unifyFiles(std::cxx11::string, const skch::Parameters&, std::vector<std::cxx11::basic_string >, std::vector<std::__cxx11::basic_string >): Assertion `combinedOutput.is_open()' failed.

After running the following command:

/usr/bin/time -v /opt/MetaMaps-master/metamaps mapDirectly -t 20 --all -r ./myDB/DB.fa -q /datadrivehdd1/nanopore/03_projects/campanula/reads/campanula_PCS108_a2.3.0.fastq -o ./classification_results --maxmemory 100 &> output.txt

I have added the corresponding output.txt file if you would like to see the full output. output.txthttps://github.com/DiltheyLab/MetaMaps/files/2808267/output.txt

Only a 'classification_results.1' file is created, and as I read earlier this means that classify didn't finish completely.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/DiltheyLab/MetaMaps/issues/11, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGFsaUxnjuvWFrCnNF4uX1mb_vgPZG02ks5vIU4ZgaJpZM4aZocb.

aroelo commented 5 years ago

Hi Alexander,

I haven't tried writing into another directory anymore. I was working with a smaller sub set of the database so far and then I have no problems writing to the same directory (and basic checks like space on disk are okay).

I noticed however, that one time when I tried a different sub set (also quite small) I got the same error. I haven't saved the exact input&output unfortunately, but I then saw there were some sequences with a different header name than 'Cxxxx|kraken:taxid|xxxxxx|seq_name.

As you can see there is also one like that above:

Added C3600|kraken:taxid|185218|NC_012519.1 with length 7781; est. memory ~24.2109 GB Added C5076|kraken:taxid|389469|NC_008717.1 with length 36415; est. memory ~24.2112 GB Added C4950|kraken:taxid|673515|NC_021069.1 with length 10865; est. memory ~24.2113 GB Added C959|kraken:taxid|170617|NC_005038.1 with length 99657; est. memory ~24.2123 GB Added C4768|kraken:taxid|1746063|NC_028391.1 with length 1940; est. memory ~24.2123 GB Added C5424|kraken:taxid|1688637|NC_027706.1 with length 9829; est. memory ~24.2124 GB Added C2200|kraken:taxid|10815|NC_001928.2 with length 2632; est. memory ~24.2125 GB
Added kraken:taxid|4232|C1004|NW_019010654.1 with length 9845; est. memory ~24.2125 GB
Added C2630|kraken:taxid|223307|NC_002049.1 with length 2542; est. memory ~24.2126 GB Added C5772|kraken:taxid|71186|NC_009605.1 with length 2562; est. memory ~24.2126 GB Added C640|kraken:taxid|201862|NC_007216.1 with length 9695; est. memory ~24.2127 GB

Perhaps this could be the problem? Or it would be an unrelated coincidence. Checking the output.txt file it seems that all reference sequences with a 'NW_xxxxxxx' id have a different header style.

I didn't get any errors while constructing the database, so not sure where this went wrong. This was while constructing a database of the 'viral' branch of Refseq.

AlexanderDilthey commented 5 years ago

Hmm, no, the contig IDs should not lead to any problems at this point. The reference database is based on RefSeq, i.e. not proprietary? Could you make the FASTA available to me, e.g. via Dropbox? I'd then try to reproduce the issue.

aroelo commented 5 years ago

The reference database is only based on RefSeq. A bit of a delay (sorry about that), but here is a link to the FASTA: https://www.dropbox.com/s/zlusvvpm0eb4mlu/DB.fa.gz?dl=0

AlexanderDilthey commented 5 years ago

Thank you! I'll look into this!


Von: aroelo notifications@github.com Gesendet: Mittwoch, 27. Februar 2019 14:40 An: DiltheyLab/MetaMaps Cc: Alexander Dilthey; Comment Betreff: Re: [DiltheyLab/MetaMaps] Assertion 'combinedOutput.is_open()' failed (#11)

The reference database is only based on RefSeq. A bit of a delay (sorry about that), but here is a link to the FASTA: https://www.dropbox.com/s/zlusvvpm0eb4mlu/DB.fa.gz?dl=0

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/DiltheyLab/MetaMaps/issues/11#issuecomment-467865145, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGFsacXhzU9tACCp04VVUgpNtQY1Z6iPks5vRoqxgaJpZM4aZocb.

AlexanderDilthey commented 5 years ago

Hi @aroelo, sorry, this took me some time. But I can map against the database you provided without any errors. To test this, I've extracted a few sequences from the database and used these as test reads. I attach the file I've used for this test as a reference.

DB1_pseudoReads.zip

The assertion combinedOutput.is_open() is checked immediately after opening the combinedOutput output stream, so a file system issue still seems to be the most likely explanation to me.

xlinxlin commented 5 years ago

Hi @AlexanderDilthey , I just got the same issue.

Added C1065|kraken:taxid|43989|NC_010542.1 with length 14685; est. memory ~183.84 GB
Added C12315|kraken:taxid|1379909|NZ_CP010776.1 with length 4143738; est. memory ~183.869 GB
Added C8659|kraken:taxid|333849|NC_017961.1 with length 36262; est. memory ~183.869 GB
Added C2046|kraken:taxid|1296662|NC_022980.1 with length 45798; est. memory ~183.869 GB
Added C9480|kraken:taxid|983437|NC_015962.1 with length 2625; est. memory ~183.869 GB
Added C2516|kraken:taxid|121719|NZ_CP013069.1 with length 351005; est. memory ~183.872 GB
Added C3984|kraken:taxid|x43|NZ_CP009966.1 with length 53501; est. memory ~183.872 GB
Added C2187|kraken:taxid|419435|NC_008885.1 with length 2641; est. memory ~183.872 GB
Added C7016|kraken:taxid|1608144|NC_031320.1 with length 6421; est. memory ~183.872 GB
Added C5839|kraken:taxid|518987|NC_002210.1 with length 1191; est. memory ~183.872 GB
Added C8182|kraken:taxid|941058|NC_019487.1 with length 143986; est. memory ~183.873 GB
Added C11814|kraken:taxid|938142|NZ_CP012344.1 with length 4824322; est. memory ~183.903 GB
Added C2621|kraken:taxid|391009|NC_009616.1 with length 1915238; est. memory ~183.916 GB
Added C1934|kraken:taxid|x1308|NZ_CP008943.1 with length 4806594; est. memory ~183.946 GB
Added C17174|kraken:taxid|930170|NC_017199.1 with length 139013; est. memory ~183.947 GB
Added C2408|kraken:taxid|12524|NC_004284.1 with length 5908; est. memory ~183.947 GB
Added C5327|kraken:taxid|1746071|NC_028393.1 with length 2236; est. memory ~183.947 GB
Added C9992|kraken:taxid|701521|NC_017017.1 with length 18800; est. memory ~183.948 GB
Added C17930|kraken:taxid|1815972|NC_031277.1 with length 52561; est. memory ~183.948 GB
Added C8250|kraken:taxid|858307|NZ_CP010284.1 with length 4827959; est. memory ~183.978 GB
Added C16207|kraken:taxid|1333534|NZ_CP011114.1 with length 5575484; est. memory ~184.016 GB
Added C18642|kraken:taxid|72750|NC_005209.2 with length 8006; est. memory ~184.016 GB
Added C12871|kraken:taxid|1536596|NC_027384.1 with length 37359; est. memory ~184.016 GB
Added C10964|kraken:taxid|x541|NZ_CP010838.1 with length 4107437; est. memory ~184.042 GB
Added C6279|kraken:taxid|297352|NZ_LN774769.1 with length 2394138; est. memory ~184.058 GB
Added C296|kraken:taxid|265669|NC_002973.6 with length 2905187; est. memory ~184.077 GB
Added C9877|kraken:taxid|10455|NC_009011.2 with length 131331; est. memory ~184.078 GB
Added C781|kraken:taxid|x1835|NZ_CP014252.1 with length 3039887; est. memory ~184.097 GB
Added C13293|kraken:taxid|440266|NC_009539.1 with length 5229; est. memory ~184.097 GB

Call storeCurrentState with 19133

INFO, skch::Sketch::computeFreqHist, Frequency histogram of minimizers = (1, 123915428) ... (599623, 1)
INFO, skch::Sketch::computeFreqHist, With threshold 0.001%, ignore minimizers occurring >= 4273 times during lookup.
INFO, skch::Map::mapQuery, [count of mapped reads, reads qualified for mapping, total input reads] = [1306053, 1306363, 3625881]
INFO, skch::Map::Map, Time spent mapping the query : 4393.34 sec
INFO, skch::Sketch::build, minimizers picked from reference = 5837567235

Index construction DONE, wrote 1 files.

metamaps: src/map/mapWrap.h:40: void mapWrap::unifyFiles(std::__cxx11::string, const skch::Parameters&, std::vector<std::__cxx11::basic_string<char> >, std::vector<std::__cxx11::basic_string<char> >): Assertion `combinedOutput.is_open()' failed.

The command is: ./metamaps mapDirectly -t 100 --all -r databases/miniSeq+H/DB.fa -q barcode.fastq -o classification_results I used your default database.

AlexanderDilthey commented 5 years ago

@xlinxlin Is it possible you don't have write permissions in the directory you are executing MetaMaps in? Can you confirm by doing echo 1 > classification_results and then an ls -l classification_results in the directory?

xlinxlin commented 5 years ago

Hi @AlexanderDilthey , I ran it again and the error message didn't show again. 1 Command is: /opt/MetaMaps/metamaps mapDirectly -t 100 --all -r ~/Downloads/databases/miniSeq+H/DB.fa -q ~/Yan_test/20190725_SP_Metagenomics/fastq/testMetaMaps/barcode03.fastq -o classification_results But the strange thing for me is there are only 4 output files under /home directory, and the file names look different as your example output files. 2 Here is the output for echo 1 > classification_results and ls -l classification_results. Did I make something wrong or it is a permission issue? Thank you! 3

xlinxlin commented 5 years ago

I think I forget to run ./metamaps classify --mappings classification_results --DB databases/miniSeq+H, I will try it again.

xlinxlin commented 5 years ago

It works for me now.

AlexanderDilthey commented 5 years ago

@xlinxlin Great! I'll close this issue now.