GGFHF / TOA

TOA (Taxonomy-oriented Annotation) establishes workflows geared towards plant species that automate the extraction of information from genomic databases and the annotation of sequences.
GNU General Public License v3.0
4 stars 4 forks source link

How to build NR blast database from downloaded files without re-downloading #7

Open silasmellor opened 2 years ago

silasmellor commented 2 years ago

Hi, i've run into a frustrating problem. When i try to build the NR database, i keep getting gzip errors, probably because the files somehow get corrupted during download. This also happened to me at the NCBI refseq step, but repeating the operation a few times fixed it (i guess thats semi random). The NR database is much larger though so this is a time consuming process, and i wonder if there is a way i could download the repository myself, verify the integrity of the files and then run a script in TOA to build the database from pre-downloaded files? Thanks, Silas

fernandomoramarquez commented 2 years ago

Hi Silas:

Thanks for using TOA and your input.

The NR database needs a large amount of free space on your disk because for its building is necessary to download, currently, 59 files of several GB each one. To build the NR database you need time and a good internet connection.

In the directory ".../TOA-results/database" are the subdirectories "toabbnrbp-YYMMDD-HHMMSS" which contain the information of the runs of processes for the construction of the NR database (with BLAST+). Edit the script "toabbnrbp-process.sh" from the last run. The function “build_database_nr” contains the statements to build the NR database.

All the best,

silasmellor commented 2 years ago

Hey Fernando, thanks for the clarification. As far as i can understand from the script, it basically just unpacks the downloaded files into the folder you specified - is there any reason i cannot just do the same (without the script) outside the TOA environment. Would that database be available in TOA? Best, Silas

fernandomoramarquez commented 2 years ago

Hi Silas:

The NR database files for use with BLAST+ have to be decompressed in the subdirectory "NCBI/nr-blastplus-db" of the directory indicated in "Main menu > Configuration > Recreate TOA config file > Database directory". You could also modify the values of the environment variables "NR_BLASTPLUS_DB_DIR" and "NR_BLASTPLUS_DB_FILE" with other directory path in the TOA config file, but I do not recommend it because when the TOA config file is recreated these variables are reset.

All the best,