EricArcher / strataG

strataG is a toolkit for haploid sequence and multilocus genetic data summaries, and analyses of population structure.
25 stars 12 forks source link

fixedDifferences function returns 0 differences #14

Closed KerensaMcElroy closed 7 years ago

KerensaMcElroy commented 7 years ago

Hi again,

I'm now having trouble with the fixedDifferences function. I have tried it both on my own gtypes object, and on the example dloop.g data. In both cases it returns no differences.

I'm using strataG compiled from the source code today. My gtypes object is attached.

Thanks! gtypes.RData.zip

EricArcher commented 7 years ago

Did you sort this out, or do you still need help?

KerensaMcElroy commented 7 years ago

I think it is fine, I think it was an issue with the gtypes sequence object. I'll test it out with sequence2gtypes instead and see if there is still an issue. The test data still gave 0 differences, though, so not sure about that.


From: Eric Archer notifications@github.com Sent: Thursday, 23 March 2017 2:21 AM To: EricArcher/strataG Cc: McElroy, Kerensa (NCMI, Crace); State change Subject: Re: [EricArcher/strataG] fixedDifferences function returns 0 differences (#14)

Did you sort this out, or do you still need help?

- You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/EricArcher/strataG/issues/14#issuecomment-288433265, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFVTI1hAJtFIM7a5KE5F8l8vOInfBrk4ks5roTyCgaJpZM4MkuNh.

EricArcher commented 7 years ago

The test data (dloop.g) will give 0 fixed differences (same species). If you test the first 20 samples in dloop.g:

> dloop.g[1:30, , ]

<<< dolphin dLoop >>>

Contents: 30 samples, 1 locus, 3 strata
Stratification schemes: broad, fine

Strata summary:
               num.samples num.missing num.alleles prop.unique.alleles heterozygosity
Coastal                  5           0           3           0.3333333      0.8000000
Offshore.North          10           0           8           0.7500000      0.9555556
Offshore.South          15           0          13           0.8461538      0.9809524

There are two fixed differences between Coastal and Offshore.North:

> fixedDifferences(dloop.g[1:20, , ])
$sites
$sites$`Coastal v. Offshore.North`
               293 294
Coastal        "c" "c"
Offshore.North "t" "t"

$sites$`Coastal v. Offshore.South`

Coastal       
Offshore.South

$sites$`Offshore.North v. Offshore.South`

Offshore.North
Offshore.South

$num.fixed
        strata.1       strata.2 num.fixed
1        Coastal Offshore.North         2
2        Coastal Offshore.South         0
3 Offshore.North Offshore.South         0
EricArcher commented 7 years ago

I've looked at your data, and I'm wondering if there is something off in the way your samples are stratified. If I look at the base frequencies of variable sites in your sequence, I see similar values to your population frequencies, but when I separate out populations and look at the frequencies of variable sites in each population, they also have similar frequencies within populations.

For example, at site 4, there are 9 As and 13 Gs, however in exsul there are 3 As and 6 Gs at that site and in varius 6 As and 7 Gs. This kind of pattern occurs at several sites. I'm wondering if the sample-strata associations got mixed up in the data used to create the gtypes object.

Alternatively, there is an error in my code that I need to fix. Can you attach the files you used to create the gtypes object and the script you used so I can see how you did it? Thanks!