Closed daniellembecker closed 1 year ago
My guess is this was a space issue that did not allow the file to be expanded/created.
I ran the following on 20230103
interactive
cd /data/putnamlab/shared/databases
wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz
exit
I would not recommend this due to the amount of time it takes, but it works for trouble shooting
nano /data/putnamlab/hputnam/Ahya_Fun_Annot/scripts/make_diamond_db.sh
#!/bin/bash
#SBATCH --job-name="make_diamond_db" #CHANGE_NAME
#SBATCH -t 24:00:00
#SBATCH --export=NONE
#SBATCH --mem=100GB
#SBATCH --account=putnamlab
#SBATCH --export=NONE
#SBATCH -D /data/putnamlab/shared/databases
#SBATCH -p putnamlab
module load DIAMOND/2.0.0-GCC-8.3.0 #Load DIAMOND
diamond makedb --in /data/putnamlab/shared/databases/nr.gz -d nr
diamond dbinfo -d /data/putnamlab/shared/databases/nr.dmnd
sbatch /data/putnamlab/hputnam/Ahya_Fun_Annot/scripts/make_diamond_db.sh
This successfully completed as seen in /data/putnamlab/shared/databases/slurm-206358.out
2023-01-03 21:30:57 (3.42 MB/s) - ‘uniprot_trembl.fasta.gz’ saved [57043853550]
Building a new DB, current time: 01/04/2023 02:15:45
New DB name: /glfs/brick01/gv0/putnamlab/shared/databases/trembl_20230103
New DB title: uniprot_trembl.fasta
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 229580745 sequences in 6419.68 seconds.
STOP Wed Jan 4 04:14:31 EST 2023
mv nr.dmnd 20230104_nr.dmnd
The updated DB for Diamond blast can be found at /data/putnamlab/shared/databases/20230104_nr.dmnd
To fix the space issue, did you delete files prior to running the script? What changed between previous runs and this run to allow for the space?
The only thing I did was to delete nr.dmnd. It is also possible other people cleared space in their personal directories that freed up space for everyone.
@AHuffmyer was working on her Mcap functional annotation recently and noticed an issue with making an updated nr.dmnd file when she wanted to download the most recent nr database in FASTA format from NCBI and use it to make a Diamond-formatted nr database following this protocol Step 2: Identify homologous sequences.
When following the protocol step:
Go to the sbatch_executables subdirectory in the Putnam Lab shared folder and run the scripts, make_diamond_nr_db.sh and make_diamond_nr_db.sh in this order:
She was running into this error in the script output:
After discussing with @daniellembecker, we thought that maybe it was an issue with @AHuffmyer permissions, but @daniellembecker also re-ran the scripts on December 19th 2022 with the same 'Inflate Error'.
@daniellembecker then suspected it may be due to the fact that the nr and nr.gz databases may need to be updated since they were last downloaded last year. She deleted the previous files and re-downloaded them.
@hputnam re-visitied this issue on January 4th 2023 and is currently running the scripts to see if it worked.