esrud / GONE

GONE: Scripts, programs and an example data set
42 stars 3 forks source link

Questions about input data/results interpretation #23

Open quinn-ca opened 1 year ago

quinn-ca commented 1 year ago

Hello,

I'm interested in using Gone for my data and had a few questions about input data/results interpretation to determine if the program is appropriate for my data.

My data: I have snp data (generated using ddRADseq), sampled from four sites (sample sizes range n = 20 to n = 40). I do not have a genetic map. My final dataset has ~29K snps.

First, should any data filtering be conducted prior to running this analysis? It makes sense to me to filter for missing data, minimum mean read depth, and minimum allele count (I've noted your caution about minimum allele frequency filtering). Is it appropriate/desirable to retain all snps in a locus (for assessing genetic structure, I retained only a single snp per locus). Should I conduct analyses separately for distinct genetic clusters, given that population structure can affect results?

I ran Gone on one of my sampled sites that appeared to be its own genetic cluster. I used the default parameters (except changed hc = 0.01 per the recommendations in the user guide). The genetic locations column contained '0's. First, I noticed a variable number of snps per chromosome (1200-5500) and only chromosomes 1-13 were considered (14-25 were not included). The results suggest an Ne of 24,000 (generation 1), roughly similar to our most recent census (17,000). Additionally, the results show a drastic decrease in Ne 30 generations ago. A population expansion would make sense with what we know of our study system, although given the amount of data, I understand the timing of this expansion may not be reliable. My focal species is long-lived (30+ years) with overlapping generations. I sampled breeding adults, but cannot be sure of the age of adults sampled. From the paper, I understand that overlapping generations can be a challenge with this analysis.

Overall,

  1. Do I have data sufficient to use Gone?
  2. If so, should I be including/omitting filtering steps prior to running the analyses?
  3. Given my data, are my results for this single sampling site encouraging? Are there any specific interpretations/cautions I need to keep in mind given the potentially overlapping generations sampled?

I sincerely appreciate your help! Quinn

armando-caballero commented 1 year ago

Dear Quinn, I will try to reply to your questions.

My data: I have snp data (generated using ddRADseq), sampled from four sites (sample sizes range n = 20 to n = 40). I do not have a genetic map. My final dataset has ~29K snps.

First, should any data filtering be conducted prior to running this analysis? It makes sense to me to filter for missing data,

minimum mean read depth, and minimum allele count (I've noted your caution about minimum allele frequency filtering).

Is it appropriate/desirable to retain all snps in a locus (for assessing genetic structure, I retained only a single snp per locus).

Should I conduct analyses separately for distinct genetic clusters, given that population structure can affect results?

I ran Gone on one of my sampled sites that appeared to be its own genetic cluster. I used the default parameters (except changed hc = 0.01 per the recommendations in the user guide).

The genetic locations column contained '0's.

First, I noticed a variable number of snps per chromosome (1200-5500) and only chromosomes 1-13 were considered (14-25 were not included).

The results suggest an Ne of 24,000 (generation 1), roughly similar to our most recent census (17,000). Additionally, the results show a drastic decrease in Ne 30 generations ago. A population expansion would make sense with what we know of our study system, although given the amount of data, I understand the timing of this expansion may not be reliable.

My focal species is long-lived (30+ years) with overlapping generations. I sampled breeding adults, but cannot be sure of the age of adults sampled. From the paper, I understand that overlapping generations can be a challenge with this analysis.

Overall,

  1. Do I have data sufficient to use Gone?
  1. If so, should I be including/omitting filtering steps prior to running the analyses?
  1. Given my data, are my results for this single sampling site encouraging? Are there any specific interpretations/cautions I need to keep in mind given the potentially overlapping generations sampled?

With best wishes, Armando.

quinn-ca commented 1 year ago

Hi Armando,

Thank you for your helpful reply! I'll look more into a genetic map and see what is possible with my data, but it's good to know my data could be sufficient.

From your recommendations, I'll create a new set of snps for this analysis with rigorous genotyping protocols, but foregoing filtering for missing data. I will retain all snps per locus and create separate datasets based on my genetic structure results, to avoid issues caused by population structure.

I'm using hc = 0.01 because based on my genetic structure analysis, my sampling sites do appear to be admixed. Thank you for pointing out the maxNCHROM input parameter, I'm not sure how I missed that. I have renamed my chromosomes to be chronological numbers, starting with 1, but I'll update the number of chromosomes also.

Apologies that my description of my results was confusing. The output suggests a population expansion beginning 30 generations ago, and a population expansion is consistent with what we might expect based on anecdotal historic counts. So, I took this as a positive step toward validating the utility of Gone for my study system and data. To put an approximate date to '30 generations ago', I've been reviewing several definitions/equations for calculating generation time for a species, for which they may give very different answers. Does this program calculate 'generations' with a specific definition/equation, or is the interpretation of generation time determined based solely on my understanding of my species' biology?

Thanks again! Quinn

armando-caballero commented 1 year ago

There is no specific definition of generation except the usual one: parent-progeny in a Wright-Fisher model.

armando-caballero commented 1 year ago

Sex chromosomes cannot be analysed directly by GONE at the moment. Do not include them in your input data.