modification of command to extract COG identifiers for prokka v1.13

jennalang commented 6 years ago

egrep "COG[0-9]{4}" PROKKA_${date}.gff | cut -f9 | cut -f1,5 -d ';'| sed 's/ID=//g'| sed 's/;dbxref=COG:/\t/g' | grep COG

jvhagey commented 5 years ago

I actually needed to add .\+ since my COGs were not found directly next to the locus_tag. egrep "COG[0-9]{4}" ./Output/Standard.gff | cut -f9 | sed 's/.\+COG$[0-9]\+$.\+;locus_tag=$GANJLKBE_[0-9]\+$;.\+/\2\tCOG\1/g' > Standard.cog

Jigyasa3 commented 5 years ago

I have noticed that some of my COGs do not have a corresponding ec_number. With the code provided in the workshop tutorial, we are extracting all COGs. Why is that?

For example- _1. ID=AELJIOAN_00031;eC_number=2.6.1.83;Name=dapL_1;dbxref=COG:COG0436;gene=dapL_1;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:A0LEA5;locus_tag=AELJIOAN_00031;product=LL-diaminopimelate aminotransferase

ID=AELJIOAN_00034;Name=fliS_1;dbxref=COG:COG1516;gene=fliS_1;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:P39739;locus_tag=AELJIOAN00034;product=Flagellar secretion chaperone FliS

EnvGen / metagenomics-workshop

modification of command to extract COG identifiers for prokka v1.13 #8