aindj / k-SLAM

k-SLAM ultra fast alignment and taxonomic classification of metagenomic datasets
GNU General Public License v3.0
23 stars 5 forks source link

NCBI taxonomy and bacterial/viral genomes #13

Open linsalrob opened 7 years ago

linsalrob commented 7 years ago

Hi folks

For k-SLAM installatin I need to download the NCBI taxonomy, bacterial, and viral genomes (i.e. using install_slam.sh). If I already have these downloaded somewhere can I just point to the appropriate locations, rather than duplicating all the data again?

Thanks

Rob

aindj commented 7 years ago

Rob,

Yes you can run the database build on pre-downloaded files. Have a look in the k-SLAM install script to see what it does with the downloaded files.

You should be able to create a database directory with the correct directory structure. Extract all the gz files. Put the names.dmp/nodes.dmp in a folder called "taxonomy", the gbff files for bacteria in a folder called "bacteria" and the virus gbff files in a dir called viruses.

Once in the database directory

To make the taxonomy database:

(path to SLAM executable) --parse-taxonomy taxonomy/names.dmp taxonomy/nodes.dmp --output-file taxDB

To make the genome database:

(path to SLAM executable) --output-file database --parse-genbank bacteria/.gbff viruses/.gbff

On 21/06/17 18:48, Rob Edwards wrote:

Hi folks

For k-SLAM installatin I need to download the NCBI taxonomy, bacterial, and viral genomes (i.e. using install_slam.sh). If I already have these downloaded somewhere can I just point to the appropriate locations, rather than duplicating all the data again?

Thanks

Rob

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aindj/k-SLAM/issues/13, or mute the thread https://github.com/notifications/unsubscribe-auth/AC5I8cP7hCInkguSSeLaYvUsVK1fIIPbks5sGVdigaJpZM4OBTT6.

waywardsyintist commented 6 years ago

Can this be done off the .fna files? Or does the script _need the .gbff files?

waywardsyintist commented 6 years ago

Also...

Any idea on what may be causing this error when trying to build the database from pre-downloaded files as described above..

/home/src/k-SLAM/SLAM: error while loading shared libraries: libboost_program_options.so.1.53.0: cannot open shared object file: No such file or directory

I can't find much about this dependency library. Any help appreciated.

aindj commented 6 years ago

gbff files are needed as they contain taxonomy information

That dependency is boost and should be fairly easy to install on any linux machine