Open GastonViarengo opened 3 years ago
Sorry for the delayed reply, which version of Centrifuge did you use? Thank you.
Sorry for the delayed reply, which version of Centrifuge did you use? Thank you.
Hello Li Song, no problem, thanks for your response. I'm using versión 1.0.4-beta. Could you help me find out the problem? Thank you.
I just checked the log and realized that I fixed this bug after the release of 1.0.4-beta. Can you try git clone to get the most recent version of Centrifuge? Thank you.
Thanks Li Song, I'll try with that and let you know how it goes. What was the bug?. Bests, Gastón.
I also ran into this (or a similar issue) while I was using the provided Makefile to make an nt
database. Compiling 65c42fc from source did not change anything.
Hi, I have a similar issue with nt. I'm using version 1.0.4. I modified map file to have something starting with : accession.version taxid A00001.1 10641 A00002.1 9913 A00003.1 9913 A00004.1 32630 A00005.1 32630 and launched centrifuge-build -p 16 --bmax 1342177280 --conversion-table gi_taxid_nucl.2map \ --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp \ nt.fa nt
After one hour, the process do not write anything else. nr.1.cf and nt.3.cf are not empty but nt.2.cg is empty. I have only warning in output logs. The process uses only one CPU. Moreover, nt indexes available in centrifuge web site are not up to date (They are from 2018). Could you help me, please ? Thanks a lot in advance
Hi all, I have the same error with nt, anyone fix it?
Hi all, I have similar problem with a custom database. Did anyone figure it out?
I gave up finally!
On Mon, May 23, 2022 at 10:45 AM Natalia Savytska @.***> wrote:
Hi all, I have similar problem with a custom database. Did anyone figure it out?
— Reply to this email directly, view it on GitHub https://github.com/DaehwanKimLab/centrifuge/issues/199#issuecomment-1134371090, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGTQVVQISC777JRSXRPHSQDVLNATJANCNFSM4RKXA3TQ . You are receiving this because you commented.Message ID: @.***>
For me this occurred error "Warning: taxomony id doesn't exists for NC_0####.1! (repetead several times for different ids)" it was that when I concatenated several seqid2taxid.maps it sporadically missed a newline at a junction between two files which made centrifuge miss all the NCBI taxid entries after that, when running centrifuge-build
is there any solution, if anyone got? I am in this situation from last 20 days.
Thank Ram
Hello Any suggestions.
Hi It seems I need to change the strategy to analyze my data. Any suggestion other than Centrifuge? I am using Long reads data from ONT, does "Kraken2" will work for Taxonomy analysis?
Pls suggest. Thanks RNS
hi
Hi, have there been any updates on this issue? I am encountering the same thing.
How much memory do you have on your server and which database are you building? Thank you.
I am trying to build a custom database based on bacteria, viral, fungi and protozoa downloaded from RefSeq. I'm running centrifuge v1.0.4, and have tried with the conda installation and installed from source. The total size of my fasta file is 148GB. On my last attempt to build, I tried with 80GB of memory and 8 cores. I didn't get any error messages about running out of memory, I just got warnings e.g. "Warning: taxonomy id doesn't exists for NCxxx" as above, and the output file refseq.4.cf was empty. I have access to more memory though, so I could try with that. The command I used to build was:
centrifuge-build --conversion-table ${db}/seqid2taxid.map --taxonomy-tree ${software}/taxdump/new_taxdump_2023-08-01/nodes.dmp --name-table ${software}/taxdump/new_taxdump_2023-08-01/names.dmp ${db}/refseq_all_genomic.fasta refseq -p 8
With 148G sequence, I think you may need about 600GB memory to build the index. You can increase --dcv and --bmax values to reduce the memory, but may taking longer time to build.
OK thank you, I will try that!
Hello everyone. I've recently started using Centrifuge, and I've been able to create a viral index and use it with my metagenomic data. However, when I'm trying to build a bacteria index (bac), the process hangs up (at least that's the only explanation I've encountered so far). I'm using the following script:
centrifuge-build -p 8 --conversion-table seqid2taxid.map --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp inputs/seq_bac.fna indices/bac
The files bac.1.cf, bac.2.cf, and bac.3.cf, are created within a few minutes after the job begins, but file bac.2.cf is 0 kb size. The output shows:
Settings: Output files: "indices/bac..cf" Line rate: 7 (line is 128 bytes) Lines per side: 1 (side is 128 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Local offset rate: 3 (one in 8) Local fTable chars: 6 Max bucket size: default Max bucket size, sqrt multiplier: default Max bucket size, len divisor: 4 Difference-cover sample period: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void:8, int:4, long:8, size_t:8 Input files DNA, FASTA: inputs/seq_bac.fna Reading reference sizes Warning: Encountered reference sequence with only gaps Time reading reference sizes: 00:07:04 Calculating joined length Writing header Reserving space for joined string Could not allocate space for a joined string of 67127059294 elements. Switching to a packed string representation. Reading reference sizes Warning: Encountered reference sequence with only gaps Time reading reference sizes: 00:07:04 Calculating joined length Writing header Reserving space for joined string Joining reference sequences Time to join reference sequences: 00:07:05 Warning: taxomony id doesn't exists for NC_017270.1! (repetead several times for different ids) Warning: Taxonomy ID 90270 is not in the provided taxonomy tree (taxonomy/nodes.dmp)! (repetead several times for different ids)
Even after leaving it running for a few days, bac.*.cf files do not show modifications, and output is freezed (I believe hanged up).
I've tried removing the erroneus IDs but the process still hangs up.
Could you help me understand what's going on in order to solve this?
Thank you so much!
Best regards
Prof. Dr. Gastón Viarengo Institute of Molecular and Cellular Biology of Rosario (IBR-CONICET) Human Virology Lab