dipetkov / eems

Estimating Effective Migration Surfaces
GNU General Public License v2.0
101 stars 28 forks source link

Error: disimilarity matrix is not a full rank matrix #69

Open space-beaver opened 1 year ago

space-beaver commented 1 year ago

Dear Desislava,

I realise this error has been raised before (#12) but I have followed all suggestions and cannot find a solution to this error with my data. My final dataset has 106 individuals and 1294 snps.

From a starting tped file with 106 individuals, from chr19 of squirrel WG shotgun data and 46,490 snps, I have followed this process:

  1. Convert tped to ped and remove sites with missing genotypes plink --tfile chr19_all --recode --allow-extra-chr --out chr19_all_noMiss --geno 0

  2. Transpose to SNP-major and generate bim, bed, fam files plink --bfile chr19_all_noMiss --transpose --make-bed --out chr19_all_noMiss_trans

  3. Recode chromosome names sed -i 's/LR738630.1/19/g' chr19_all_noMiss_trans

  4. Calculate diffs matrix with bed2diffs (done with v1 and v2) ./bed2diffs_v1 --bfile chr19_all_noMiss_trans

  5. Run eems ./runeems_snps --params squirrels.ini --seed 123

Error: [Graph::initialize] Generate population grid and sample assignment Loaded sample coordinates from chr19_all_noMiss_trans.coord There are 42 observed demes (out of 81 demes) The population grid has 81 demes and 198 edges There are 106 samples assigned to 42 observed demes [Graph::initialize] Done.

[Diffs::initialize] Error: The dissimilarity matrix is not a full-rank distance matrix.

I have run both bed2diffsv1 and bed2diffsv2 and have the same problem with both (they are identical). I have checked the matrix properties in R (same for both matrices) and these are shown below. The eigenvalues have 2 positive values, and the rest are negative. The sum is also non-negative, so this may indicate a problem but I can't find a reason why these matrices don't fulfil the requirement of a full rank matrix for the eems program. If anyone can help find the solve, I'd be very grateful. I have attached .diffs, .ini, .coord, .bim, .bed and .fam files.

isSymmetric(mat_v1, check.attributes = FALSE) TRUE isSymmetric(mat_v1) FALSE

rankMatrix(mat_v1) [1] 105 attr(,"method") [1] "tolNorm2" attr(,"useGrad") [1] FALSE attr(,"tol") [1] 2.353673e-14

check that matrix is non-negatibve

a <- min(mat_v1 >= 0) 1L

Check that matrix has a diagonal of 0s by extracting diagonal values

mat_v11[row(mat_v1)==col(mat_v1)] [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [44] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [87] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

calculate eigenvalues, sum and n pos, n neg

eigenvals <- eigen(mat_v1)$values sum(eigenvals) -1.103423e-13 neg <- sum(eigenvals < -tol) 104L pos <- sum(eigenvals > +tol) 2L

svul_eems.zip

JoshuaThieriot commented 1 year ago

Hi, I have encountered the same problem with my distance matrix.

I found this issue mainly ascribed to the linear resemblance of each column in your matrix. Therefore, imputing the missing data will only make some of the columns conform to the requirements.

You may try the following:

  1. use a "jittered" matrix with every number slightly changed
  2. try another way of calculating distances between each individual. As far as I know, the EEMS does not necessarily require a distance matrix computed using PLINK.

Let me know if these methods help. I am working on EEMS as well! :)

space-beaver commented 1 year ago

Hi @JoshuaThieriot, thanks for your reply. In the end I gave up and used feems instead, which was much more straight forward.

https://github.com/NovembreLab/feems

hanjizhe18 commented 4 months ago

Hello, have you solved it yet?