Open shlomobl opened 6 years ago
It's normal that there are a few sequences that can not be found in the taxonomy tree. If you want them to be in the index, you need to add those terms in the seqid_to_taxid file. For example, you need to add a taxonomy id for NZ_FWVH01000246.AA. And you also need to make sure the taxonomy id is in the nodes.dmp and names.dmp files as well.
Thanks. What about the second error: _Warning: taxomony id doesn't exists for NZ_FWVH01000246.AA! Warning: taxomony id doesn't exists for NGGTT! Warning: taxomony id doesn't exists for NZ_FWQG010ACCGATCAGCAGCACCAGCAGCAGGCAGGCCATTACCGCCCCCAGCGA! Warning: taxomony id doesn't exists for NZFWDZ010001TTATTATTATGCCAACCATTGGTTTTA! Looks like something went wrong with the reference file? but I can't find it...
That is about the second warning.
The "only gaps" warning might be about the sequences with low complexity.
Isn't it strange that the beginning of the sequence was joined to the accession number NZ_FWQG010ACCGAT... perhaps this is causing the error message?
Yes. But I'm not sure what causes the concatenation.
At least I found that it happens during the "cat" step when generating the input reference files. The original *.fna file downloaded seems to be OK. It's strange because I can't find a pattern for this error, say, every X entries.
Oh, I see. Could you please run "ls | grep ".fna$" | xargs cat >> ..." to concat the files?
Hmmm that's what I did, but you mean without -n and -P options?
Yes, without them.
Hei, I have the same issue but for all of the IDs. This is what I have done:
centrifuge-download -o taxonomy taxonomy
centrifuge-download -o library -m -d "bacteria" refseq > seqid2taxid.map
after downloading:
cat library/*/*.fna > input-sequences.fna
--> when the fna file was created:
centrifuge-build -p 4 --conversion-table seqid2taxid.map \
--taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp \
input-sequences.fna centrifuge_bacteria
Three files are generated:
However, for all bacteria in the database I get the message:
Warning: taxomony id doesn't exists for NZ_LT969520.1! Warning: taxomony id doesn't exists for NZ_LT985474.1! Warning: taxomony id doesn't exists for NZ_LT985188.1! Warning: taxomony id doesn't exists for NZ_LT990039.1! Warning: taxomony id doesn't exists for NZ_LT991954.1! Warning: taxomony id doesn't exists for NZ_LT991955.1! Warning: taxomony id doesn't exists for NZ_LT991956.1! Warning: taxomony id doesn't exists for NZ_LT991957.1! Warning: taxomony id doesn't exists for NZ_LT991958.1! Warning: taxomony id doesn't exists for NZ_LT991959.1! Warning: taxomony id doesn't exists for NZ_LT991960.1! Warning: taxomony id doesn't exists for NZ_LT992488.1! Warning: taxomony id doesn't exists for NZ_LT992489.1! Warning: taxomony id doesn't exists for NZ_LT992486.1! Warning: taxomony id doesn't exists for NZ_LT992487.1! Warning: taxomony id doesn't exists for NZ_LT992492.1! Warning: taxomony id doesn't exists for NZ_LT992493.1! Warning: taxomony id doesn't exists for NZ_LT992502.1! Warning: taxomony id doesn't exists for NZ_LS398547.1!
and so on The files nmes.dmp and nodes.dmp do exist
When using ls | grep ".fna$" | xargs cat >> sequences.fna I only get an empty file
It's normal to have some taxonomy ids missing. You can grep, for example, "NZ_LT969520" in the *.map file to make sure. If nothing found, that means the corresponding genome is somehow not registered in the taxonomy tree.
Hi,
I am having the same issue as mentioned by @SK-N-BE. I understand this is not a problem but it says these warnings and then does nothing. I let it ran for one day and it did not stop. When tyring the same again it stopped at the same part. Should I stop it by myself and does it mean it has ended? I don't know how to know if everything went as planned.
Thank you and hope you understand my question.
Alžběta
Hi,
While I'm building my index with centrifuge-build, I got these errors:
1) Warning: Encountered reference sequence with only gaps
2) Warning: taxomony id doesn't exists for NZ_FWVH01000246.AA! Warning: taxomony id doesn't exists for NGGTT! Warning: taxomony id doesn't exists for NZ_FWQG010ACCGATCAGCAGCACCAGCAGCAGGCAGGCCATTACCGCCCCCAGCGA! Warning: taxomony id doesn't exists for NZ_FWDZ010001TTATTATTATGCCAACCATTGGTTTTA!
Does it mean something wrong with the input files? Where should I look for errors?