emmanuelparadis / pegas

Population and Evolutionary Genetics Analysis System
GNU General Public License v2.0
28 stars 10 forks source link

Ops.factor(left, right) error #90

Open SimeoneCA opened 5 months ago

SimeoneCA commented 5 months ago

Hi Emmanuel,

I am working with phased loci data in pegas - when I was trying to compute pairwise distances among haplotypes, using dist.haplotype.loci(), I came across the error:

"Error in Ops.factor(left, right) : level sets of factors are different"

The data seem to be imported and haplotype() created an object "haplotype.loci" class. Looking at my imported dataset, I noticed that some positions have a factor of 3 (i.e 0|1, 1|0, 1|1) or 4 (i.e 0|0, 0|1, 1|0, 1|1). I am not sure if this is the reason for the error, but is there any suggestion on how to work with phased data that may have variable factored data at certain positions?

Thank you, Chris

emmanuelparadis commented 5 months ago

Hi Chris,

You seem to have identified the problem correctly. You can try to edit the function yourself. First make a copy in your workspace:

mydist <- pegas::dist.haplotype.loci

You can then either print and copy the code in a a file[1] that you can source() later (a good solution so you add this in your script), or edit the code directly with fix(mydist):

### find this line
            d[k] <- sum(x[, i] != x[, j])
### to become:
            d[k] <- sum(as.character(x[, i]) != as.character(x[, j]))

You should be able to use mydist() or if you prefer copy it in your workspace (so you don't need to change your script):

dist.haplotype.loci <- mydist

You can still use the pegas version in the same session with pegas::dist.haplotype.loci().

Tell me how this works with your data.

Cheers, Emmanuel

[1] Or more directly:

cat(deparse(pegas::dist.haplotype.loci), sep = "\n", file = "mydist.R")
SimeoneCA commented 5 months ago

Hi Emmanuel,

Thank you for your response! The changes above do allow for dist.haplotyp.loci (in this case above, mydist()) to proceed. However, all of the distances calculated across all variant pairwise comparisons is = 1. This command calculates the Hamming distance for pairwise comparisons, right? Is it taking account the differences for a single position, or is the distance calculated across the entire haplotype? Having all of my distances = 1 is unclear, though I may be misinterpreting the calculation.

Thank you, Chris

emmanuelparadis commented 5 months ago

Yes, this function computes the (unscaled) Hamming distance. Maybe each haplotype differs by one SNP?