DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
235 stars 73 forks source link

NCBI nt index #273

Open igordot opened 3 months ago

igordot commented 3 months ago

The Centrifuge website has a link to the NCBI nucleotide non-redundant sequences index from 2018. It's possible to generate one, but that is a very long process. Do you plan to offer a more recent version of this index?

nicolo-tellini commented 3 weeks ago

Hi,

I wrote an email to the guy that maintains the page of the indexes, I do not think he is anyone involved in centrifuge directly.

mourisl commented 3 weeks ago

The nt database is very huge now..so we don't have the computing resource (probably need a machine with >2TB memory) to build the index. I think there are other labs have built the nt index, not sure whether they are publicly accessible now.

khyox commented 3 weeks ago

Thanks @mourisl! We are in the process of public release of a recent nt index. It should be available in 24-48 h. We'll let you all know as soon as it happens.

igordot commented 3 weeks ago

Just for future reference, it's also possible to try Centrifuger if computing resources are more limited. Check this related thread: https://github.com/DaehwanKimLab/centrifuge/issues/275

khyox commented 2 weeks ago

Our pre-print accompanying the release of a new Centrifuge nt database is online now: Addressing the dynamic nature of reference data: a new nt database for robust metagenomic classification. Any feedback will be welcome!

nicolo-tellini commented 2 weeks ago

thanks a lot @khyox. yes sure, I will return to you with some feedback!