Closed huizhen2014 closed 5 years ago
Your assessment is correct. A higher percent identity cutoff usually results in a smaller core genome as there is a higher threshold for sequences to be aligned to each other and vice versa. As to what value you should select, it depends a lot on your organism and what you expect from your data. The default of 85% is usually a good balance, but I'd say try a few values and see how it changes the core genome results.
Thank you!
Hi, I am a technician major in NGS and associated bioinformatic analysis from Shenzhen City, China. Recently, My boss wanted me to discriminate the different genes from two group of samples by spine.pl. I reviewed the article(Characterization of the core and accessory genomes of Pseudomonas aeruginosa using bioinformatic tools Spine and AGEnt) to understand the basic analysis principle and running instruction. Now, I encountered a problem to decide which percent(-p), identity of regions to be considered homologies , I should use to call the core genome. Because I am new to the microbial analysis, I guess the homologous regions recognized by spine.pl with the percent(-p) equal and bigger than default 85 would be collected as the core genome, and the more bigger of the percent, the less of the genomic regions would be taken as core genome. In the same way, the bigger of the percent, the more of the genomic regions would be distributed to accessory genomes. So, could you tell me whether my consideration above is right and give me some advice about how to assign the parameter number for the homologue regions? Thank you!