Open quinn-ca opened 1 year ago
Dear Quinn, I will try to reply to your questions.
My data: I have snp data (generated using ddRADseq), sampled from four sites (sample sizes range n = 20 to n = 40). I do not have a genetic map. My final dataset has ~29K snps.
First, should any data filtering be conducted prior to running this analysis? It makes sense to me to filter for missing data,
minimum mean read depth, and minimum allele count (I've noted your caution about minimum allele frequency filtering).
Is it appropriate/desirable to retain all snps in a locus (for assessing genetic structure, I retained only a single snp per locus).
Should I conduct analyses separately for distinct genetic clusters, given that population structure can affect results?
I ran Gone on one of my sampled sites that appeared to be its own genetic cluster. I used the default parameters (except changed hc = 0.01 per the recommendations in the user guide).
The genetic locations column contained '0's.
First, I noticed a variable number of snps per chromosome (1200-5500) and only chromosomes 1-13 were considered (14-25 were not included).
The results suggest an Ne of 24,000 (generation 1), roughly similar to our most recent census (17,000). Additionally, the results show a drastic decrease in Ne 30 generations ago. A population expansion would make sense with what we know of our study system, although given the amount of data, I understand the timing of this expansion may not be reliable.
My focal species is long-lived (30+ years) with overlapping generations. I sampled breeding adults, but cannot be sure of the age of adults sampled. From the paper, I understand that overlapping generations can be a challenge with this analysis.
Overall,
With best wishes, Armando.
Hi Armando,
Thank you for your helpful reply! I'll look more into a genetic map and see what is possible with my data, but it's good to know my data could be sufficient.
From your recommendations, I'll create a new set of snps for this analysis with rigorous genotyping protocols, but foregoing filtering for missing data. I will retain all snps per locus and create separate datasets based on my genetic structure results, to avoid issues caused by population structure.
I'm using hc = 0.01 because based on my genetic structure analysis, my sampling sites do appear to be admixed. Thank you for pointing out the maxNCHROM input parameter, I'm not sure how I missed that. I have renamed my chromosomes to be chronological numbers, starting with 1, but I'll update the number of chromosomes also.
Apologies that my description of my results was confusing. The output suggests a population expansion beginning 30 generations ago, and a population expansion is consistent with what we might expect based on anecdotal historic counts. So, I took this as a positive step toward validating the utility of Gone for my study system and data. To put an approximate date to '30 generations ago', I've been reviewing several definitions/equations for calculating generation time for a species, for which they may give very different answers. Does this program calculate 'generations' with a specific definition/equation, or is the interpretation of generation time determined based solely on my understanding of my species' biology?
Thanks again! Quinn
There is no specific definition of generation except the usual one: parent-progeny in a Wright-Fisher model.
Sex chromosomes cannot be analysed directly by GONE at the moment. Do not include them in your input data.
Hello,
I'm interested in using Gone for my data and had a few questions about input data/results interpretation to determine if the program is appropriate for my data.
My data: I have snp data (generated using ddRADseq), sampled from four sites (sample sizes range n = 20 to n = 40). I do not have a genetic map. My final dataset has ~29K snps.
First, should any data filtering be conducted prior to running this analysis? It makes sense to me to filter for missing data, minimum mean read depth, and minimum allele count (I've noted your caution about minimum allele frequency filtering). Is it appropriate/desirable to retain all snps in a locus (for assessing genetic structure, I retained only a single snp per locus). Should I conduct analyses separately for distinct genetic clusters, given that population structure can affect results?
I ran Gone on one of my sampled sites that appeared to be its own genetic cluster. I used the default parameters (except changed hc = 0.01 per the recommendations in the user guide). The genetic locations column contained '0's. First, I noticed a variable number of snps per chromosome (1200-5500) and only chromosomes 1-13 were considered (14-25 were not included). The results suggest an Ne of 24,000 (generation 1), roughly similar to our most recent census (17,000). Additionally, the results show a drastic decrease in Ne 30 generations ago. A population expansion would make sense with what we know of our study system, although given the amount of data, I understand the timing of this expansion may not be reliable. My focal species is long-lived (30+ years) with overlapping generations. I sampled breeding adults, but cannot be sure of the age of adults sampled. From the paper, I understand that overlapping generations can be a challenge with this analysis.
Overall,
I sincerely appreciate your help! Quinn