Closed ManuelSpinola closed 1 year ago
Hi @ManuelSpinola, in short, there is not any universally optimal way to decide on the k
number. (On a side note: the same is true for compactness
).
You just need to know that there are two alternative arguments in supercells()
that allow deciding on the resulting number of supercells.
The first, k
, relates to the number of supercells desired by the user.
The second, step
is the distance, in the number of cells, between the initial superpixels’ centers (in other words, the initial size of a supercell).
Let's start by reproducing the code from https://jakubnowosad.com/supercells/articles/motifels.html.
library(supercells) # superpixels for spatial data
library(terra) # spatial raster data reading and handling
library(sf) # spatial vector data reading and handling
library(motif)
landcover = rast(system.file("raster/landcover2015.tif", package = "motif"))
plot(landcover)
comp_output = lsp_signature(landcover, type = "composition", window = 20,
normalization = "pdf", ordered = FALSE)
comp_output = lsp_restructure(comp_output)
comp_output = lsp_add_terra(comp_output)
comp_output2 = subset(comp_output, 3:9)
plot(comp_output2)
I can think of four possible approaches to decide on the k
number.
step = 3
).
Then you may merge similar supercells using a clustering method (for examples, see https://doi.org/10.1016/j.jag.2022.102935).clean = FALSE
), then try a few sizes and compare results.slic1000 = supercells(comp_output2, k = 1000, compactness = 0.1, dist_fun = "jsd", clean = FALSE)
slic2000 = supercells(comp_output2, k = 2000, compactness = 0.1, dist_fun = "jsd", clean = FALSE)
slic4000 = supercells(comp_output2, k = 4000, compactness = 0.1, dist_fun = "jsd", clean = FALSE)
# viz only three first raster layers
library(tmap)
tmap_mode("view")
tm_shape(comp_output2) +
tm_raster() +
tm_facets(as.layers = TRUE) +
tm_shape(slic1000) +
tm_borders(col = "#7553DB") +
tm_shape(slic2000) +
tm_borders(col = "#F2506E") +
tm_shape(slic4000) +
tm_borders(col = "#EBB364")
Thank you very much Jakub. I will try that.
Manuel
On Tue, 6 Dec 2022 at 11:30 Jakub Nowosad @.***> wrote:
Hi @ManuelSpinola https://github.com/ManuelSpinola, in short, there is not any universally optimal way to decide on the k number. (On a side note: the same is true for compactness).
You just need to know that there are two alternative arguments in supercells() that allow deciding on the resulting number of supercells. The first, k, relates to the number of supercells desired by the user. The second, step is the distance, in the number of cells, between the initial superpixels’ centers (in other words, the initial size of a supercell).
Let's start by reproducing the code from https://jakubnowosad.com/supercells/articles/motifels.html.
library(supercells) # superpixels for spatial data
library(terra) # spatial raster data reading and handling
library(sf) # spatial vector data reading and handling
library(motif)
landcover = rast(system.file("raster/landcover2015.tif", package = "motif"))
plot(landcover)
comp_output = lsp_signature(landcover, type = "composition", window = 20,
normalization = "pdf", ordered = FALSE)
comp_output = lsp_restructure(comp_output)
comp_output = lsp_add_terra(comp_output)
comp_output2 = subset(comp_output, 3:9)
plot(comp_output2)
I can think of four possible approaches to decide on the k number.
- Create supercells as small as possible to detect a pattern (depending on the input data, it can be as small as step = 3). Then you may merge similar supercells using a clustering method (for examples, see https://doi.org/10.1016/j.jag.2022.102935).
- Create supercells based on existing knowledge of the size of patterns/processes you are studying.
- Create supercells based on the spatial scale of interest (e.g., what is the size of regions you want to analyze).
- Create supercells by testing different parameters, and visually deciding on the optimal ones. For this approach, I would suggest disabling the additional process of connectivity enforcement (clean = FALSE), then try a few sizes and compare results.
slic1000 = supercells(comp_output2, k = 1000, compactness = 0.1, dist_fun = "jsd", clean = FALSE)
slic2000 = supercells(comp_output2, k = 2000, compactness = 0.1, dist_fun = "jsd", clean = FALSE)
slic4000 = supercells(comp_output2, k = 4000, compactness = 0.1, dist_fun = "jsd", clean = FALSE)
viz only three first raster layers
library(tmap)
tmap_mode("view")
tm_shape(comp_output2) +
tm_raster() +
tm_facets(as.layers = TRUE) +
tm_shape(slic1000) +
tm_borders(col = "#7553DB") +
tm_shape(slic2000) +
tm_borders(col = "#F2506E") +
tm_shape(slic4000) +
tm_borders(col = "#EBB364")
— Reply to this email directly, view it on GitHub https://github.com/Nowosad/supercells/issues/21#issuecomment-1339728555, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFI3FB727HSN7DOHCWSSGE3WL5Z3TANCNFSM6AAAAAASLPBEOY . You are receiving this because you were mentioned.Message ID: @.***>
-- Manuel Spínola, Ph.D. Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA @. @.> @.*** Teléfono: (506) 8706 - 4662 Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/index.php/manuel Blog sobre Ciencia de Datos: https://mspinola-ciencia-de-datos.netlify.app
@Nowosad, would these same parameters work for tuning compactness
? And if so, would you be able to provide some guidance on how to choose the range of values to test? Adjusting the formula from your 2021 paper to be in terms of supercells parameters I believe the distance equation should be
$$D= \sqrt{(\frac{d\text{spectral}}{\text{compactness}})^2 +(\frac{d\text{spatial}}{\text{step}})^2} $$
(though I'm unsure if step
should be converted from cell to map units).
From this equation I can see that if the same spectral data were run through the SLIC algorithm but it was measured in different units, the compactness
parameter would need to change to get an equivalent result. From doing some reading and looking at the equation I know that larger values will emphasize space and be closer to k means clustering of coordinates whereas smaller values will emphasize spectral characteristics more, and that compactness
depends on the range of input cell values and selected distance measure. That being said, given the range of data and selected distance measure (euclidean in my case), I'm unsure how to know what a small value for compactness
is, what a large value is, and what a value that provides approximately equal weight would be. Do you have any guidance on that?
The example of New Guinea has a k = 2000. Is there any reason to choose that value?