eriqande / newhybrids

Bayesian analysis of hybrids
GNU General Public License v2.0
8 stars 4 forks source link

SNP limit #9

Open ciselkemahli opened 5 months ago

ciselkemahli commented 5 months ago

Hello, I wonder that how many SNPs can handle the NEWHYRIDS program? I installed it to the university server and it is running with ~180GB memory. Do you have any guess for optimal analysis? Thank you.

eriqande commented 5 months ago

Hi Cisel,

I don't know what the maximum number of SNPs is off the top of my head, but I do know that the model itself does not account for physical linkage, so, once you have more then one SNP per chromosome, the model starts being violated. If you use thousands of SNPs the estimates of uncertainly around the hybrid categories will be greatly deflated because most of the SNPs are co-inherited with others on the same chromosome, and the model does not account for that.

If you have a genome's worth of data, I would recommend finding SNPs that are most differentiated between the two parental species, and then from amongst those, choosing no more than 3 or 4 per chromosome, preferably on different arms or ends.

How differentiated are the species you are dealing with?

Cheers,

eric

On Wed, May 29, 2024 at 1:08 AM Çisel Kemahlı @.***> wrote:

Hello, I wonder that how many SNPs can handle the NEWHYRIDS program? I install it to the university server and it is running with ~180GB memory. Do you have any guess for optimal analysis? Thank you.

— Reply to this email directly, view it on GitHub https://github.com/eriqande/newhybrids/issues/9, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPQ4JW3IFCCKSTQ2W77WPTZEV5GVAVCNFSM6AAAAABIOJXSLOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDENJUGA4TQOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ciselkemahli commented 4 months ago

Dear Eric,

Thank you so much for the quick response. As you explained, I should decrease my SNP number to hundreds because of deflation.

I am trying to figure out hybrid wolves using WGS data. I do not have dog genomes from the same region, so I have only wolf genomes to determine F1, F2 generations, and backcrosses. As you suggested, I will reduce SNPs based on the linkage. My problem is understanding hybrid individuals without prior information about one of the parents. That's why I tried to do this analysis of as many SNPs as possible. Thank you.

Best wishes.

Çisel

eriqande commented 4 months ago

Hi Çisel,

I see. In that case, you might want to do a first run with ADMIXTURE and K=2 and the wolves as individuals known to be from one source population. Then use the estimated allele frequencies from the two subpopulations to identify a small subset of loci that you could then use to identify F1, F2, and backcrosses, etc.

NewHybrids does not mix terribly well with many markers. The Structure/ADMIXTURE model tends to produce better mixing.

I hope that is helpful. Cheers,

eric

On Thu, May 30, 2024 at 1:46 AM Çisel Kemahlı @.***> wrote:

Dear Eric,

Thank you so much for the quick response. As you explained, I should decrease my SNP number to hundreds because of deflation.

I am trying to figure out hybrid wolves using WGS data. I do not have dog genomes from the same region, so I have only wolf genomes to determine F1, F2 generations, and backcrosses. As you suggested, I will reduce SNPs based on the linkage. My problem is understanding hybrid individuals without prior information about one of the parents. That's why I tried to do this analysis of as many SNPs as possible. Thank you.

Best wishes.

Çisel

— Reply to this email directly, view it on GitHub https://github.com/eriqande/newhybrids/issues/9#issuecomment-2138888146, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPQ4JRO4PAQW25WRRIP6SDZE3KOJAVCNFSM6AAAAABIOJXSLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZYHA4DQMJUGY . You are receiving this because you commented.Message ID: @.***>