Trimming links with length less than scene pixel resolution

Lvulis commented 4 years ago

I ran into the following issue. There is a link present which is so small that when I rasterize my links json in R that it literally has no points (some kind of interpolation issue probably). It's visible in the image below. So the link is caused because there are an excess number of links here in part due to these small islands. It has a length of 30sqrt(2), so basically 2 nodes which are diagonally adjacent on a 30-m resolution grid. I found 4 total links like this on the scene.

I see two solutions:

better CN mask (remove small islands!) which will prevent this from occurring in the first place
Some kind of link length checker in the prune_nodes? If you can compute length of line features and the length is < 1 or 2 pix or something should be merged with nearest link? Generally the algorithm would go: Compute all link lengths For all link lengths < cutoff (say 2 pix) { remove the link. Combine the two nodes to be one node Connections of new node will be downstream/upstream neighbors (should max be 4 neighbors?)

More examples below:

jonschwenk commented 4 years ago

Yes, there is actually already a function to do this called remove_single_pixel_links here. It is called as the last step when you prune your network. It is currently hard-coded to find only links of exactly 1 pixel length. I agree it would be nice to have this be a flexible function, and might be a good first issue for someone to tackle!

If your examples above are not removing the single-pixel links, please post some data that would allow me to recreate the problem.

Finally, if you use the to_geotiff('directions') method, RivGraph will rasterize the vectorized network for you (and save it to base_class_obj.paths['linkdirs']). Not sure if that's helpful, but might be more convenient than re-doing it in R. Note that the output raster links will be scaled from 0-1, so you might want to binarize the image before further processing.

Lvulis commented 4 years ago

So I turned off the additional notes call in prune_network and recompiled the package. Having done that should have realised that the function call right below does the same thing...

I used the data from issue #10 to produce this. Posted again here: Data: Yukon_GSW_201407.zip

Just a quick observation/stab at the problem: What I get as link id 533 has a length of two of 2 pixels in python before writing, but probably corresponds to 2 diagonally connected points. The solution I found is to identify what links to remove from the line feature lengths. Using R (If I could magically transport my R knowledge into Python without taking time, would do so now!) to compute link length of the line features, I find a length of 42 or (30sqrt[2]) m for link 533. To remove this I basically did delete_links (rewritten in R-GIS speak). Not sure how long the vectorization in Python is taking, so doing a 'distance' like that on the points may be a good solution (similar to how link lengths are computed), and then filtering by distances < pixlen2.

Thanks! I wanted labelled segments so rasterizing the links is better for me in R for now.

jonschwenk commented 4 years ago

This issue stems from the fact that there are many ways to "correctly" represent a topology or skeletonize a mask. The difficulty with this is that we can't simply delete links, as that would break the network connectivity. We have to delete links and reorganize the network to maintain connectivity, which is non-trivial.

One solution could be to simply delete the n-pixels links, and extend the links at its endpoint to cover the deleted portion. However, the issue with this is that we would have two (or more) links occupying n of the same pixel. Perhaps a little difficult to understand with words here, but the short of it is that we have to extend links to maintain connectivity lost by deleting links.

Lvulis commented 4 years ago

I implemented a fix for this:

First, a links terminal coordinates are its nodes. Second, a link with a distance of 1 or sqrt(2) means that 2 nodes are 8-neighbor connected. Let's say there are 2 links, A and B, with A terminating in node 1 and B terminating in node 2. Nodes 1 and 2 are adjacent in the sense above, so link C connects the two. When you rasterize the scene, because C is so short it won't show up in the raster (at least, in my R-rasterization it did not). That is actually how I found this issue in the first place.

We can generate a new node, node 3, as the mean of the coordinates of nodes 1 and 2, and remove nodes 1 and 2. This will arbitrarily put the node on either pixel when rasterizing, but maintain connectivity between the nodes and links. To your point above, the corresponding links don't need to be adjusted in length, but their connectivity does. The length doesn't need to be adjusted because if you maintain a raster with the same resolution as the resolution you generated the links, they'll intersect on the edge of the pixel anyways. I have an "OK" fix for this in R, by no means good, the code is below. links and nodes are "sf" objects (simple features), and the code iterates through length 2 < pixresolution nodes.

links$length <- st_length(links)
bdlnk <- links$id[which(as.numeric(links$length) < (2*pixres))]
nodtocomb <- strsplit(as.character(links$conn[match(bdlnk, links$id)]), ", ")
if(length(nodtocomb) > 0) {
  for(i in 1:length(nodtocomb)) {
    # What node to replace
    nod_ind <- match(nodtocomb[[i]], nodes$id)
    # Interpolate new coordinates
    new_coords <- t(colMeans(st_coordinates(nodes[nod_ind, ])))
    # replace conn list
    og_conn <- unlist(strsplit(as.character(nodes$conn[nod_ind]), ", "))
    new_conn <- setdiff(og_conn, bdlnk[i])

    # generate new SF object inefficiently.
    new_node_sf <- st_sfc(st_point(new_coords))
    st_crs(new_node_sf) <- st_crs(nodes)
    new_node <- st_sf(geometry = new_node_sf)
    new_node$id <- nodtocomb[[i]][1]
    new_node$conn <- paste(sort(new_conn), collapse =", ")
    new_node <- new_node[, c("conn", "id", "geometry")]

    # Update node list
    nodes <- nodes[-nod_ind, ]
    nodes <- rbind(nodes, new_node)

    # Remove bad link
    links <- links[-match(bdlnk[i], links$id), ]
    # Get links that need to be changed
    links_update_ind <- match(new_conn, links$id)

    # Replace the links with new conn values
    for(j in links_update_ind) {
      as.character(links$conn[j])
      # gsub(nodtocomb[[i]], new_node$id, as.character(links$conn[j]))
      conn_torp <- strsplit(as.character(links$conn[j]), ", ")[[1]]
      lnkconntorp <- match(nodtocomb[[i]], strsplit(as.character(links$conn[j]), ", ")[[1]])
      lnkconntorp <- lnkconntorp[!is.na(lnkconntorp)]
      conn_torp[lnkconntorp] <- new_node$id

      links$conn[j] <- paste(sort(conn_torp), collapse =", ")
    }

  }
}

nodes$id <- as.integer(nodes$id)

jonschwenk commented 4 years ago

That's a little hard to digest without a diagram. Could you whip one up real quick? Also, RG keeps track of each pixel of each link (the 'idx' attribute), so no need to find these instances by length.

jonschwenk commented 4 years ago

Oh, I should also mention that there is a slight bug in the to_geotiff('directions') code where some node pixels aren't represented (I think). Not sure if you're using that as a starting point, but that could be a source of errors. It sounds like you're re-skeletonizing. In that case, there should be no missing pixels, so I'm a little unsure about what R could be doing. If the pixel is missing, connectivity is not preserved...

Lvulis commented 4 years ago

Working on it! Rereading see that it's confusing, but basically the short links are artifacts of the skeleton.

I'm not using that as a starting point, I'm rasterizing the links LINESTRING in R, which is calling gdal. Any links that have total length <2*pixres where pixres is the spatial resolution of the target raster won't show in that raster. Waiting for something to finish and will hit you with some pictures.

Lvulis commented 4 years ago

OK, I am still on the version prior to the one you put out last week (so links and nodes are unprojected in EPSG 4326). I ran RivGraph on a Yukon channel mask, and got my links nodes etc. Artificial nodes are disabled because I wanted individual links to be recognized as individual links for an experiment here, so I don't have a directionality map. Data, including the modified links/nodes .json files, is here: RG_issue_11.zip

In QGIS if we take a look at link 1292, we see that it is pretty short. The two nodes are on adjacent pixels in fact.

If we rasterize the links LINEFEATURE in R, which I believe calls gdal to do (see st_rasterize, https://cran.r-project.org/web/packages/stars/vignettes/stars5.html), because that link is so small it doesn't show up. In the image below different colors arbitrarily represent different individual links. The display here is not projected so may be distorted a little.

The corresponding rasterized nodes look like this:

The basic idea of the script I put above is to define a new node in the original POINT object whose location is the midpoint between the original 2 nodes (1, 2) and whose connectivity is the union of links adjacent to nodes 1, 2 minus the short link. The short link is then deleted. Because that node is on the border of the pixel corresponding to two links, I don't change the link length (anyways the link length is discrete based on the mask resolution and this value wouldn't 'fit'). Visually, the orange is the new node:

The resulting rasterized node section loses one of the adjacent nodes, but the rasterized links will look the same.

Looking now, the actual links won't lie on the node but if you need to check what links spatially intersect an intersection with a tolerance of pixres x sqrt(2) will work. Otherwise the connectivity remains the same. The distance from the link end to the node is something like pixres x sqrt(2)/2.

Lvulis commented 4 years ago

I forgot to add here, the way I implemented it was to get out of Python as fast as possible because I had to get some results. There's probably a faster way to do it and a proper place to do it, like the prune_network call, rather than what I did.

jonschwenk commented 4 years ago

Thanks for the detailed write up with figs. My initial thought here is that this would be a post-processing tool, maybe a function called simplify_network() that takes a "tolerance" keyword argument that specifies the minimum link length. I agree that it could be useful.

The reason it wasn't implemented (beyond links of one pixel length) is as described above--it was just too complicated and low priority during the initial development of RG. RG has additional constraints that you don't adhere to in your processing tool--the primary of these being that each link can be explicitly mapped back to the raster space (i.e. links are collection of pixel centroids), and links must connect all nodes (i.e. there can't be a gap as shown in your final figure). The rest of RG functionality is built under the assumption that those constraints are met, so it would be a very big job to try to implement this as a pre-processing tool (or default processing step).

If you (or anyone) ever wanted to take a stab at developing something for Python, that'd be great! Given the current task list, it seems unlikely that I'd get to it in the near future.

VeinsOfTheEarth / RivGraph

Trimming links with length less than scene pixel resolution #11