Open jennalang opened 6 years ago
I actually needed to add .\+
since my COGs were not found directly next to the locus_tag.
egrep "COG[0-9]{4}" ./Output/Standard.gff | cut -f9 | sed 's/.\+COG\([0-9]\+\).\+;locus_tag=\(GANJLKBE_[0-9]\+\);.\+/\2\tCOG\1/g' > Standard.cog
I have noticed that some of my COGs do not have a corresponding ec_number. With the code provided in the workshop tutorial, we are extracting all COGs. Why is that?
For example- _1. ID=AELJIOAN_00031;eC_number=2.6.1.83;Name=dapL_1;dbxref=COG:COG0436;gene=dapL_1;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:A0LEA5;locus_tag=AELJIOAN_00031;product=LL-diaminopimelate aminotransferase
egrep "COG[0-9]{4}" PROKKA_${date}.gff | cut -f9 | cut -f1,5 -d ';'| sed 's/ID=//g'| sed 's/;dbxref=COG:/\t/g' | grep COG