Open kmin940 opened 6 years ago
Hi,
The way to do this is to download some genomes from your species of interest. You probably need at least 50 and then determine which genes are single copy and core. There are many ways to do that but I reannotated genomes to ORFs using prodigal and then assigned ORFs to COGs with RPSBlast exactly as was done for contigs. Then just set some threshold say if 97% of the species have that gene and it is single copy then you will use it for the core. It is quite straightforward really.
Best, Chris
I see! Thank you very much for your clear answer! I will have a try:) Thank you again!
I suggest to download the species specific table from here: https://www.ncbi.nlm.nih.gov/research/cog/
Hi, can you answer to these questions? It will be of great help for me.
In $DESMAN/complete_example directory, there is a file named EColi_core_ident95.txt. How can I obtain this kind of data for different microorganisms?
And is there a way to get pre-identified sequences for each of the 982 single copy core COGs from NCBI or any other site? COG database does not seem to be maintained. ( wget https://www.dropbox.com/s/f6ojp1qt4fz5lzn/Hits.tar.gz)
I want to have these data for different microorganisms. Thank you very much.