emmanuelparadis / pegas

Population and Evolutionary Genetics Analysis System
GNU General Public License v2.0
27 stars 10 forks source link

Subset a haplotype object based on names? #89

Closed ChrisK1988 closed 2 months ago

ChrisK1988 commented 4 months ago

Hello,

Is it possible to subset a haplotype class object based on the names of the haplotypes? I have a region of interest that is 3593 bases long, and has 700 haplotypes. When filtering with subset(h, minfreq=2) I am left with 142 haplotypes. I am trying to filter it down to ~50 of the most common haplotypes, however, I have two populations of interest with unique haplotypes (n=3) that I want to preserve that would not pass any filtering based solely on size.

For example, I have object h, with haplotypes I, II, III, IV, V, and VI with frequencies of 1, 25, 1, 10, 2, 10. How I would subset this to only take haplotypes I, II, III, and VI, for example?

I have tried using subset as if it was a list, an atomic vector, and a matrix, but all that keeps happening is it sets the haplotype frequencies equal to 1, which is fine for generating the network I guess, but trying to plot any sort of frequency information is impossible at that point.

Thank you kindly,

Chris

emmanuelparadis commented 4 months ago

Hello,

You can get the numbers of each haplotype with:

Nh <- summary(h)

Then you can define a selection with, for instance:

sel <- Nh == 2 | Nh > 25

Since the "haplotype" object is a matrix, you can subset it with:

h[sel, ]

and for the vector (no comma):

Nh[sel]

Both objects have the same names (ie, all(rownames(h) == names(Nh)) should return TRUE), so you can also subset with, eg:

sel <- c("I", "II", "VI")

Cheers,

Emmanuel

ChrisK1988 commented 4 months ago

Thank you kindly, Emmanuel. I will give this a try.

Chris