EnvGen / metagenomics-workshop

Metagenomics Workshop
28 stars 36 forks source link

modification of command to extract COG identifiers for prokka v1.13 #8

Open jennalang opened 6 years ago

jennalang commented 6 years ago

egrep "COG[0-9]{4}" PROKKA_${date}.gff | cut -f9 | cut -f1,5 -d ';'| sed 's/ID=//g'| sed 's/;dbxref=COG:/\t/g' | grep COG

jvhagey commented 5 years ago

I actually needed to add .\+ since my COGs were not found directly next to the locus_tag. egrep "COG[0-9]{4}" ./Output/Standard.gff | cut -f9 | sed 's/.\+COG\([0-9]\+\).\+;locus_tag=\(GANJLKBE_[0-9]\+\);.\+/\2\tCOG\1/g' > Standard.cog

Jigyasa3 commented 5 years ago

I have noticed that some of my COGs do not have a corresponding ec_number. With the code provided in the workshop tutorial, we are extracting all COGs. Why is that?

For example- _1. ID=AELJIOAN_00031;eC_number=2.6.1.83;Name=dapL_1;dbxref=COG:COG0436;gene=dapL_1;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:A0LEA5;locus_tag=AELJIOAN_00031;product=LL-diaminopimelate aminotransferase

  1. ID=AELJIOAN_00034;Name=fliS_1;dbxref=COG:COG1516;gene=fliS_1;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:P39739;locus_tag=AELJIOAN00034;product=Flagellar secretion chaperone FliS