Open Artifice120 opened 2 months ago
Greetings,
Thanks for this suggestion!
We discussed potential ways to address this issue: Adding an option to run DIAMOND, instead of BLAST, to perform the Similarity Analysis step; Adding a checkpoint system that checks output folders and files, and skips that respective step if the results are already there.
These should be implemented in a future release.
Have a script for running blast search as an array for each contig. Not sure if it is helpful ....
#!bin/bash/
list=$(cat final-tigs)
tig=(
$list
)
#loop that repeats equal to the number of variables in the Array for all variables in arrat (@) the current vaiable value $names is diffrent for each iteration of the loop all other variables are "constant" exept the time varaible is equal to whatever the server currently says
for tig in "${tig[@]}" ; do
echo "$tig"
## remove empty placeholder files in active directory
find . -type f -empty -delete
## If statement checks if output file exists, if it does then it skips to the next contig
output=$(echo "/lustre/isaac/scratch/jtorre28/foxgloves/purged/purged2/tmp/$tig.out")
query=$(echo "/lustre/isaac/scratch/jtorre28/foxgloves/purged/purged2/tmp/$tig.fa")
if test -f $output ; then
echo "$(date +%Y-%m-%d_%H:%M:%S) | skipped $tig"
fi
## If statement checks if output file for contig exists, If it does NOT then it extracts teh single contig sequence and blast searches that contig and echo's time of search
if ! test -f $output ; then
echo "$(date +%Y-%m-%d_%H:%M:%S) | extracting $tig"
sed -n "/$tig/,/t/p" pilon-bubble-filter.fasta | head -n -1 > $query
blastn -db nt\
-query $query \
-outfmt '6 qseqid qgi qacc sseqid sallseqid sgi sallgi sacc sallacc qstart qend sstart send qseq sseq evalue bitscore score length pident nident mismatch positive gapopen gaps ppos frames qframe sframe btop staxids sscinames scomnames sblastnames sskingdoms stitle salltitles sstrand qcovs qcovhsp' \
-max_target_seqs 10 \
-max_hsps 1 \
-evalue 1e-25 \
-num_threads 48 \
-out $output
fi
done
Afternoon,
I have attempted to use the AnnotaPipline on a SLURM cluster. It is running fine so far, except I can only run the node continuously for 6 days at a time. In this time frame the BLAST searches are not all able to finish and ends up starting from the very beginning at AUGUSTUS.
Is there a way to have AnnotaPipeline skip to the last step that it left off on based on the raw output folders or just have it skip to a specified point in the process?