egonozer / Spine

Identification of conserved nucleotide core genome of bacteria and other small genome organisms
http://vfsmspineagent.fsm.northwestern.edu
GNU General Public License v3.0
7 stars 1 forks source link

About the -p parameter for spine.pl #1

Closed huizhen2014 closed 5 years ago

huizhen2014 commented 5 years ago

Hi, I am a technician major in NGS and associated bioinformatic analysis from Shenzhen City, China. Recently, My boss wanted me to discriminate the different genes from two group of samples by spine.pl. I reviewed the article(Characterization of the core and accessory genomes of Pseudomonas aeruginosa using bioinformatic tools Spine and AGEnt) to understand the basic analysis principle and running instruction. Now, I encountered a problem to decide which percent(-p), identity of regions to be considered homologies , I should use to call the core genome. Because I am new to the microbial analysis, I guess the homologous regions recognized by spine.pl with the percent(-p) equal and bigger than default 85 would be collected as the core genome, and the more bigger of the percent, the less of the genomic regions would be taken as core genome. In the same way, the bigger of the percent, the more of the genomic regions would be distributed to accessory genomes. So, could you tell me whether my consideration above is right and give me some advice about how to assign the parameter number for the homologue regions? Thank you!

egonozer commented 5 years ago

Your assessment is correct. A higher percent identity cutoff usually results in a smaller core genome as there is a higher threshold for sequences to be aligned to each other and vice versa. As to what value you should select, it depends a lot on your organism and what you expect from your data. The default of 85% is usually a good balance, but I'd say try a few values and see how it changes the core genome results.

huizhen2014 commented 5 years ago

Thank you!