Dee-chen / Tree2gd

GNU General Public License v3.0
34 stars 7 forks source link

About the minimum number of taxa #10

Open lfp-a opened 1 year ago

lfp-a commented 1 year ago

I see that the default parameter of the minimum number of samples is 4. I would like to know how to adjust this number. For example, I have 150 samples, how many should be set appropriately?

Dee-chen commented 1 year ago

The selection of the minimum number of genes is generally based on where you expect WGD to occur. The default 4 genes is to detect all levels of GD events (at least two pairs of homologous genes in two species are required), and this parameter is also used in the data of 68 species in our paper. This greatly increases the number of gene trees that need to be computed, thus slowing down the computation.

If, based on your research experience, you are only generating WGD at more than 5 species levels in your project of 150 species, or if you are only focusing on WGDs above this level, it is recommended that you set up more than 10 to speed things up.

On the other hand, simply setting the number of genes is a rough filter. In our laboratory's previous research, sometimes we will manually select gene families that guarantee the existence of species genes in at least several important branches, which can often achieve better results. However, a large number of manual steps are required, and some important new discoveries may be missed.