Open wittler-github opened 1 year ago
I think it is fine to ignore those sequences. Many such cases are from the dustmasker that removes the simple sequences and others. So even if keeping their original sequences, they are hard to be classified with.
I think so too, in this case only a very very small fraction of reference sequences showed this error, the very large input data (about 40-60 Gb) was dustmasked also.
As you can see in attached files, I get this error many times, however centrifuge completes without error. It uses a vast data to build index about 40-70GB i reckon.
Is this a significant issue that one should clean up some NCBI indices .fna files for only showing NNNNN... and no real sequence ? Where the input .fna files was dustmasked with option centrifuge-download -d. Will this just be a statistical issue, that is negligible in the large amount of data used, or is it something one should rectify ?
centrifuge_build.zip
Warning: Encountered reference sequence with only gaps Warning: Encountered reference sequence with only gaps .....