Closed ireneortega closed 3 years ago
Hi @ireneortega , can you please share with us: i) what kind of input data you have (gbk files, fna files, etc) ii) which command did you use: get_homologues.pl or get_homologues-est.pl
By default only core genes are produced, you need to add -t 0 to get accessory clusters, of any size
Bruno
i) I am using gbk files annotated with Prokka ii) I used get_homologues.pl
I added the -t 0 option and now the pangenome is constructed as I want. Many thanks!
Just last doubt: I got an empty 1234_proteinannotation.faa file and its corresponding line in the pangenome_matrix_genes_t0.tab is empty/-/-
, so I can not extract any info from what cluster. What could happened? Perhaps I should delete that cluster.
Thanks for your help!
That's a probably a gene without sequence in the gbk file, please confirm
I couldn't prove it. I count the number of times the text /translation=
is repeated in the gbk file and it is equal to the counts printed by the tool when running ./get_homologues.pl in line 10. And in the gbk.faa file generated by the tool, there is no empty line.
I don't know if there is another way to check it :(
The culprit is something called ~ proteinannotation , which might well be in the prokka output and perhaps get_homologues should not be parsing it? Let us know if you find out or please share the gbk file so I can test it here, Bruno
I want to add coverage and identity filters to infer homology between genes in strains. I am working with same species genomes and I need high coverage and identity to consider a gen as present in each genome. However, when using high values for both parameters (97 % identity and 99 % coverage) I only get core genes, how about shell genes?
I want to get a presence/absence pangenome matrix using 97 % identity and 99 % coverage to identify unique genes in strains and gene clusters.