Open splaisan opened 2 years ago
Thank you so much for this comment!
Further, you can avoid the usage of the 3rd party tool 'faSplit from UCSC tools' with:
awk '/^>/ {OUT="splitseqs/" substr($0,2) ".fa"}; OUT {print >OUT}' multi-proteins.fa
Additionally, instead of parallel
someone could also use xargs -P ${pthr}
if parallel
is not installed...
Thanks for your help. I wanted to put the complete command of xargs here for your reference: find splitseqs -type f -name '*.fasta' | \ xargs -P ${pthr} -I {} java -jar ${ECPRED_PATH}/ECPred.jar \ weighted {} \ ${ECPRED_PATH} \ $PWD \ results/$(basename {})_out
echo -e "Protein ID\tEC Number\tConfidence Score(max 1.0)" > ECPred_results.tsv cat results/*_out 2>/dev/null | grep -v '^Protein' | sort -k 1V,1 >> ECPred_results.tsv
I used the following scheme to process 1000's of input proteins in a more realistic time. maybe this can help others!
Please test if you have enough RAM when using multiple cores here!