dipetkov / eems

Estimating Effective Migration Surfaces
GNU General Public License v2.0
102 stars 28 forks source link

Error: The population grid is not connected. #24

Closed YanisChrys closed 6 years ago

YanisChrys commented 6 years ago

Hello, I'm trying to use runeems_snp for some data and I get the above error saying the population grid is not connected. I did a search online after looking at the source code and I can't figure out what could cause that error or how I can fix it.

My polygon coordinates start and end at the same point on the map(the locations for the polygon vertices were found by looking up the appropriate coordinates on google maps and are tab-delimited geographical coordinates. The ones for the samples were in the data. )

here's the stderror and eemsrun.txt file:

Using Boost 1_53 (EEMS was tested with version 1_57) Eigen 3.2.2 (EEMS was tested with version 3.2.2)

Will save EEMS output to ./0-0.output/ [Habitat::initialize] Loaded habitat points from ./0-0/0-0.outer [Habitat::initialize] Done.

[Graph::initialize] Generate population grid and sample assignment Loaded sample coordinates from ./0-0/0-0.coord Error: The population grid is not connected.

parameters:

Input parameter values: datapath = ./0-0/0-0 mcmcpath = ./0-0.output prevpath = gridpath = distance = euclidean diploid = 0 nIndiv = 2582 nSites = 1233553 nDemes = 200 seed = 1519055359 numMCMCIter = 2000000 numBurnIter = 1000000 numThinIter = 9999 negBiSize = 10 negBiProb = 0.670000 qVoronoiPr = 0.250000 mrateShape = 0.000500 qrateShape = 0.002000 sigmaShape = 0.001000 qrateScale = 0.500000 mrateScale = 2.000000 sigmaScale = 1.000000 mSeedsProposalS2 = 0.010000 qSeedsProposalS2 = 0.100000 mEffctProposalS2 = 0.100000 qEffctProposalS2 = 0.001000 mrateMuProposalS2 = 0.010000

dipetkov commented 6 years ago

This might be an error but it can also happen as eems tries to fit in a regular triangular grid inside a regularly shaped habitat.

Has runeems generated the following files (before printing the error message and stopping): mcmcpath/demes.txt, mcmcpath/edges.txt, mcmcpath/ipmap.txt where mcmpath is the name of the output folder? If yes, then the program has generated enough information to visualize the disconnected grid. (If not, then indeed there might be a bug.) To understand the situation better, you can plot the (unsuccessfully constructed, disconnected) population grid as follows:

# Use purple and green colors for clarity
eems.population.grid(mcmcpath = mcmcpath,
                     plotpath = plotpath,
                     longlat = FALSE,
                     add.outline = TRUE, col.outline = "purple", lwd.outline = 3,
                     add.grid = TRUE, col.grid = "green", lwd.grid = 2,
                     add.demes = TRUE)

Next, I'll assume that there are no bugs and the population grid is really disconnected. You can either try to increase the density of the grid (use larger nDemes) or enlarge the habitat outline (so that it easier to fit a connected grid).

YanisChrys commented 6 years ago

Hi, First of all, thank you very much for your reply!

I ran eems by increasing the number of demes to 500, 1000 and 1800. My region is eurasia and I'm using the following polygon after sorting the points in R depending on the central angle. Previously I had sorted them sequentially by picking them off the map from left to right. Using both polygons and all deme numbers gives me the Population graph is not connected error

32.10118 -9.49218 38.41055 -12.1289 65.65827 -169.10156 53.22576 -13.53515 66.4431 -23.73046 66.4431 -23.73046 69.7181 13.00781 78.02557 103.88671 33.28461 142.20703 6.88015 104.75842 4.39022 78.39843 27.37176 31.8164 10.6606 42.36328

I then tried to use the eems.population.grid command and it gives out a blank png document along with the error that says that my mcmcpath is neither a datapath nor a mcmcpath.

YanisChrys commented 6 years ago

But, yes the files you mentioned have been generated.

YanisChrys commented 6 years ago

So, after some looking around I found out the outer.txt file does not exist, read.graph can't find it and and that's the reason I get the error. But you didn't mention that file before. I made sure the datapath and mcmc paths are separate and that there are no common files between them. But a "outer.txt" file is not inside the mcmc path (meaning the output folder for EEMS and input folder for eemsplot)

dipetkov commented 6 years ago

Perhaps I didn't give in enough detail before.

# Use purple and green colors for clarity

mcmcpath = "/path/to/eems/output"
plotpath = "/path/to/figs/tobe/generated"

# mcmcmpath should contain at least the following files:
# demes.txt  edges.txt  eemsrun.txt  ipmap.txt  outer.txt

eems.population.grid(mcmcpath = mcmcpath,
                     plotpath = plotpath,
                     longlat = FALSE,
                     add.outline = TRUE, col.outline = "purple", lwd.outline = 3,
                     add.grid = TRUE, col.grid = "green", lwd.grid = 2,
                     add.demes = TRUE)

The outer.txt file inside mcmcpath should be a copy of the input your-datapath.outer.

Is it possible to share your input habitat outline file? With it, I can generate sample coordinates and dissimilarity matrix, and hopefully, reproduce the error that you get.

YanisChrys commented 6 years ago

yes of course, the attached file should be the one i'm currently using(polygon.txt polygon_eurasia.txt ). polygon_eurasia.txt is the one I used before. polygon.txt

dipetkov commented 6 years ago

Let's start with the habitat outline you are currently using: polygon.txt. I generated a small 10x10 dissimilarities matrix and a set of sample coordinates, just so I can execute runeems_snps: polygon.diffs and polygon.coord. The initialization file is also very simple: params.ini.

Note: Rename the files appropriately to drop the txt extension. In the initialization file, vary nDemes to increase the density of the population grid.

Next I tried nDemes = 200, 300, 500, 750, with no success. In all cases, EEMS throws the error The population grid is not connected. To see what exactly happens, I generated the population grids in R as follows:

library("rEEMSplots")
library("glue")

for (nDemes in c(200, 300, 500, 750)) {
  eems.population.grid(mcmcpath = glue("polygon-nDemes", nDemes),
                       plotpath = glue("polygon-nDemes", nDemes),
                       longlat = FALSE,
                       add.outline = TRUE, col.outline = "purple", lwd.outline = 3,
                       add.grid = TRUE, col.grid = "green", lwd.grid = 2) 
}

And here are the population grids themselves, all are obviously not connected.

nDemes = 200

polygon-ndemes200-popgrid01

nDemes = 300

polygon-ndemes300-popgrid01

nDemes = 500

polygon-ndemes500-popgrid01

nDemes = 750

polygon-ndemes750-popgrid01

When you look at the figures, it is clear that one very narrow section of the habitat causes the problem, so I think that the easiest way to deal with this is to make that area of the habitat slightly wider. Here is the updated habitat outline: polygon2.outer.

And then I simply repeat the process and try to run EEMS with nDemes = 200, 300, 500, 750. For nDemes = 200, 300, the population grid is still unconnected but it works for nDemes =500, 750.

nDemes = 200

polygon2-ndemes200-popgrid01

nDemes = 300

polygon2-ndemes300-popgrid01

nDemes = 500

polygon2-ndemes500-popgrid01

nDemes = 750

polygon2-ndemes750-popgrid01

Does this help?

YanisChrys commented 6 years ago

Hello, Yes, it definitely helps and I thank you very much for your time and your reply! Of course this raises a few more questions for me because the habitat you're displaying is the one I was able to plot with the polygon command in R and as I said my desired habitat was Eurasia, so something like this:

image

So why doesn't it look like that? Getting the "true" coordinates from google maps obviously isn't giving me what I want. How do you suggest I pick the polygon coordinates in the future? I see in the paper you do have a habitat that is all of Eurasia so how did you obtain those coordinates yourself?

Thank you very much in advance!

dipetkov commented 6 years ago

I find Google's Maps API v3 Tool useful quite useful for picking the outline off a map.

Google Maps uses the Mercator projection, and on large (continental) scale, the projection really makes a difference to how the plot looks. eems.population.grid doesn't allow you to specify a projection as it only provides a simple way to understand the "grid is unconnected" error. eems.plots allows you to specify a projection with the projection.out argument; see the examples by typing ?eems.plots in the R console.

Note: You can change the projection when plotting the results but the grid construction itself still treats the latitude/longitude coordinates as if they are Euclidean coordinates (since I couldn't figure out how to easily construct triangular grids on the surface of a sphere). It is easiest to see what I mean by showing two plots.

No projection (long/lat without any transformation)

polygon3-ndemes200-mrates01

eems.plots(mcmcpath = glue("polygon3-nDemes", nDemes),
           plotpath = glue("polygon3-nDemes", nDemes),
           longlat = TRUE,
           add.outline = TRUE, col.outline = "purple", lwd.outline = 3,
           add.grid = TRUE, col.grid = "green", lwd.grid = 2,
           projection.in = "+proj=longlat +datum=WGS84",
           projection.out = "+proj=longlat +datum=WGS84",
           add.title = FALSE, add.colbar = FALSE,
           add.map = TRUE)

Mercator projection

polygon3-ndemes200-mercator-mrates01

eems.plots(mcmcpath = glue("polygon3-nDemes", nDemes),
           plotpath = glue("polygon3-nDemes", nDemes, "-mercator"),
           longlat = TRUE,
           add.outline = TRUE, col.outline = "purple", lwd.outline = 3,
           add.grid = TRUE, col.grid = "green", lwd.grid = 2,
           add.title = FALSE, add.colbar = FALSE,
           projection.in = "+proj=longlat +datum=WGS84",
           projection.out = "+proj=merc +datum=WGS84",
           add.map = TRUE)
YanisChrys commented 6 years ago

I am truly grateful for your reply, and I thank you for all your amazing help so far! I can now safely say my problem has been fixed. The far-away point that you had to widen on the polygon before is due to the fact that close to the Bering sea the eastern border of the map meets the western part of the map, the x axis resets as you can see from the following screenshot. Which was unexpected to me, because the dashed line that surrounds the landmass makes it seem like the longitude in those latitudes extends even farther to cover the landmass, but,apparently the reset is done at a straight line. But of course that's the timezone line and x resets at the meridian.

image

Anyway, with this and by filtering out SNPs with a lot of missingness I was able to get the results that I needed today, however, I had to manually add the outer.txt file to the mcmcpath folder because eventhough EEMS ran successfully it didn't add that file in the output directory.