Closed micha-silver closed 2 years ago
@micha-silver could you share your code and data with me? (email is fine)
Sure, I'll send the data privately. Here's the test code I used:
library(supercells)
library(terra)
library(sf)
library(future)
# Set path to data directory
Data_dir = "/media/micha/Storage_8TB/Data/EO/Test"
tif_file <- file.path(Data_dir, "namibia_full.tif")
#tif_file <- file.path(Data_dir, "israel_cropped.tif")
stk <- rast(tif_file)
# Suggested chunk size
chunk_size <- ceiling(min(dim(stk)[1:2]) / 2) + 1
print(paste("Suggested chunk size:", chunk_size))
#---------------------------
# Change here to use future
#---------------------------
use_future = FALSE
if (use_future == TRUE) {
library(future.apply)
library(parallel)
# Use 1/2 of the cores
ncores <- detectCores() / 2
cl <- makeCluster(ncores)
}
#-------------------------
# Repeat supercells() three times, save each sf output
#---------------------------
sp_list <- lapply(1:3, function(ii) {
t1 = Sys.time()
cat("\n")
print(paste(t1, "-- Starting, ", "chunks =", chunk_size ))
if (use_future == TRUE) {future::plan(cluster, workers=cl)}
capture.output(superpixels <- supercells(x=stk, step=20,
compactness=2, dist_fun="jsd",
minarea=24, chunks=chunk_size,
future = use_future),
file = "/dev/null")
future::plan(sequential)
# Print some details to console
print(st_bbox(superpixels))
print(paste("Num features:", length(superpixels$geometry)))
t2 = Sys.time()
elapsed = difftime(t2, t1, units = "mins")
print(paste(t2, "-- Finished in", round(elapsed, 2), "mins"))
# cleanup /tmp before next loop
tmp_dir <- tempdir()
tmp_files <- list.files(tmp_dir, full.names = T, pattern = "^spat")
file.remove(tmp_files)
# Save geopackage
gpkg_file <- paste0("superpixels_", as.character(ii), ".gpkg")
gpkg_path <- file.path(Data_dir, gpkg_file)
st_write(superpixels, gpkg_path, append=FALSE)
return(superpixels)
})
if (use_future == TRUE) {stopCluster(cl)}
Could you also send me one of the output superpixels vectors (e.g., as a gpkg
file)?
Hi @micha-silver - thanks a lot for this discussion and the code.
I ran your code on two computers (Red Hat and Fedora) with and without the future option, but (sadly) was unable to reproduce your issue. My overall comments:
slic[[1]]
is a matrix output of the C++ code, I convert it into SpatRaster
, next I replace all -1 values with NA, and finally force this object to have the same extent as the input chunk. My best guess is that the last line above is the source of your problem. Somehow (?), the proper extent is not added (extent 0-1, 0-1 is the terra default). terra::ext(x)
code later - see https://github.com/Nowosad/supercells/blob/754c83ebf19652d2f2eb9a75117590990079d377/R/supercells.R#L156-L157. The role of this code is to update coordinates of the superpixels centroids, from column and row numbers to coordinates. Strangely, x
and y
columns were correct in your .gpkg
files with incorrect extensions...Ad 7 -> I have added a tiny update to the chunks branch of the repo.
Again thanks for the thorough testing and explanations.
I'll get your latest update and redo some testing. I want to also investigate some changes to virtual memory settings. Maybe some stuff is getting swapped out and then lost somehow. (That might explain the random nature of these missed extents).
I'll also raise an issue on the terra
github page with Robert Hijmans. Maybe he will have some insight.
Cheers,
Micha
I believe I can reproduce the issue:
library(supercells)
library(terra)
#terra version 1.4.21
library(sf)
#Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1
vol = rast(system.file("raster/volcano.tif", package = "supercells"))
s1 = supercells(vol, k = 50, compactness = 1)
terraOptions(todisk=TRUE)
s2 = supercells(vol, k = 50, compactness = 1)
st_bbox(s1)
# xmin ymin xmax ymax
#2667400 6478705 2668010 6479575
t_bbox(s2)
#xmin ymin xmax ymax
# 0 0 1 1
That would explain why this happens with large files only, and cannot always be reproduced between computers as the amount of RAM available varies. terraOptions(todisk=TRUE)
essentially emulates the behavior you get with SpatRaster
objects that are deemed too large to entirely load the cell values in RAM.
Thank you @rhijmans - this would be very helpful in debugging the problem.
Let me know if you want me to dig further, but it is probably easier for you to look at this now than it is for me.
Yes - now as I should be able to reproduce this issue - I will look into this issue in the next few days.
Hi @micha-silver, this bug seems to be fixed:
library(supercells)
library(terra)
#> terra version 1.4.21
library(sf)
#> Linking to GEOS 3.9.2, GDAL 3.3.2, PROJ 8.2.0
vol = rast(system.file("raster/volcano.tif", package = "supercells"))
s1 = supercells(vol, k = 50, compactness = 1)
terraOptions(todisk=TRUE)
s2 = supercells(vol, k = 50, compactness = 1)
st_bbox(s1)
#> xmin ymin xmax ymax
#> 2667400 6478705 2668010 6479575
st_bbox(s2)
#> xmin ymin xmax ymax
#> 2667400 6478705 2668010 6479575
Could you now try to reinstall {supercells} from the master branch and rerun your calculations to see if the {supercells} problem is fixed?
Yes, I'll rerun the test tomorrow.
On Mon, Nov 22, 2021, 11:45 Jakub Nowosad @.***> wrote:
Hi @micha-silver https://github.com/micha-silver, this bug seems to be fixed:
library(supercells) library(terra)#> terra version 1.4.21 library(sf)#> Linking to GEOS 3.9.2, GDAL 3.3.2, PROJ 8.2.0vol = rast(system.file("raster/volcano.tif", package = "supercells"))s1 = supercells(vol, k = 50, compactness = 1)
terraOptions(todisk=TRUE)s2 = supercells(vol, k = 50, compactness = 1)
st_bbox(s1)#> xmin ymin xmax ymax #> 2667400 6478705 2668010 6479575 st_bbox(s2)#> xmin ymin xmax ymax #> 2667400 6478705 2668010 6479575
Could you now try to reinstall {supercells} from the master branch and rerun your calculations to see if the {supercells} problem is fixed?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Nowosad/supercells/issues/17#issuecomment-975338465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5J52FTCSFDPIXZ7B24IH3UNIGL7ANCNFSM5H6TDOCA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Pleased to report: the problem seems to be solved. I tested 4 cycles on two different machines, with results as expected. Many thanks.
I am using the new
chunks
option ofsupercells()
on a somewhat large raster to overcome out of memory problems. I have found that after the function completes sucessfully, some of the chunks have "disappeared" from the extent of the original raster stack. The lost chunks are actually left at coordinates (0,0) to (1,1), so if I zoom way down to scale 1:2 and to coord (0,0) I can see them, all overlayed.What's more, if I run
supercells()
repeated with the same parameters, and save the result each time, the set of correctly placed chunks changes each time! See image below with three sample runs. The light colored areas are the superpixel boundaries in each case. And all three runs produced the same number of polygons in the resultsf
object.This phenomena occurred with two different satellite images, and on two separate computers (both Debian, R versions 4.0.5 and 4.1.2).
I did some tests with various parameter values to try to isolate what is causing this. I ran
supercells()
both as a single process and parallelized (future=TRUE
andfuture=FALSE
) but the problem persisted in both cases (Withfuture
enabled the run completed in 1/3 the time). Disabling the connectivity (clean=FALSE
) did not change anything. And various values ofminarea
andstep
also did not solve the problem.However I did notice that when I choose a small value for
chunks
then usually the superpixels vector is created correctly, with all chunks in the right place. "Small" chunks were less than 1/2 of the smallest dimension of the raster. i.e.cluster_size <- min(dim(stk)[1:2])/2
But if I do a repeat run with the small chunk size, sometimes the issue with chunks "left behind" in the (0,0) to (1,1) region happens again. :-(Any suggestions where to go from here? I don't have any knowledge of C++, but it looks like something weird is happening in the
run_slic
function.