Open anpefi opened 8 years ago
This bug has been reported again by mail. It happens when the dataset yields very low number of MSL with an high proportion of NAs. In that cases when the distance matrix is built it happens that some pairs of individuals cannot be compared as they are at least one NA across all loci compared, yielding a NA in the matrix.
By using njs() instead of nj() you can do the clustering because it is an algortithm designed for incomplete matrices. However, the PCoA cannot be done as, as far I know, there is not any algorithm allowing for working with missing data.
I need to think what is the best way to address this issue and then implement it, and it will take some time. Probably by using a different distance or any heuristic way to give uninformative states a distance. Suggestions are welcome about this.
An alternative workaround, if you could assume no (large) genetic differences between individuals across all the dataset then you could assume that 0/0 patterns are much more probable to be caused by hemimethylation of the target than by mutation causing a lack of the target and then consider them as methylated states (1) instead of missing (NA). In this case (no.bands="h") you can run the full analysis.
I've been recalled that another workaroud that could work in some datasets is to reduce the probability of NA in distances by reducing the threshold to define a locus as MSL or NML when having discordant patterns (option: error.rate.primer=0). By default (error.rate.primer) is set to 0.05 (the typical error in AFLPs) but it could set to any other value, including 0. Then, those loci with very few discordant patterns would be considered as MSL, increasing this number. In some datasets, setting the threshold to 0 (this assumes that there is no error in the banding) works!.
This is the issue #6262 in the R-Forge support tracker
Original comment in R-Forge: