Open Baldl opened 3 years ago
actually it would be of interest to know whats the rest of the error message - if exist.
Does anyone else have a problem with the accuracy? My confusion matrix reveals an accuracy of 99%. A look at the prediction map shows that this cannot be the case.
My Code:
test_data = readRDS(file.path(envrmt$path_model_training_data, "extr_test_extent_dop.RDS"))
model = readRDS(file.path(envrmt$path_models, "model_extent_dop.RDS"))
predicted = stats::predict(object = model, newdata = test_data)
val_df = data.frame(ID = dplyr::pull(test_data, "OBJ_ID"),
Observed = dplyr::pull(test_data, "class"),
Predicted = predicted)
val_cm = confusionMatrix(table(val_df[,2:3]))
@FloFranz well observed and and i am firmly convinced that this problem occurs very often without perhaps being perceived as a problem. If you google it you will find tons of controvers explanations to this question. If you remember, it also came up in the exercise in the previous unit. In fact, this is somewhat the essence of the title of the unit "randomly correct is not predicted"although it might be better to say "randomly correct and of high performance is not necessarily a good spatial prediction".
In particular, the contingency table shown, shows how high accuracy values can occur. In general, it is a problem of variable selection, overtraining and spatial autocorrelation. The overfitted model reproduces the training data well because it has learned it by heart, but this does not necessarily mean that the prediction is valid. More in-depth and detailed explanations are available in the further readings section.
When I run "pol = sf::st_transform(pol, crs(rasterStack))" the lists in the "geom" column in my pol file are empty afterwards. Does anybody know, why this is happening?
EDIT to a more common answer.
@GeoKL a probably reason that your data set is providing a crs in the WKT2 crs convention. Like on osm data set buildings
. The crs
of WKT2 crs objects like is by default something like:
User input: EPSG:4326
wkt:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
Using the command st_crs(pol) = 4326
should reset this to
+proj=longlat +datum=WGS84 +no_defs
After doing so the st_transfom
command should work.
The buildings file is my training data (I chose the name buildings for it). I don't knwo what you mean my osm data set?
anyway, st_crs(buildings) = 4326 dos not solve the problem
I was confused because of our conversation in the course chat. you were working on the pol
data set so you need to run st_crs(pol) = 4326
I did
@GeoKL what I just tried to clarify is that thecurrent and future format of CRS is not PROJ4 but WKT2. The geopackage as well as the OSM data from Lesson 1 are using by default WKT2. This is not compatible and one otion is to re-define ist by st_crs(x) = ESPG-NUMBER
When I try to extract the data by
extr = exact_extract(rasterStack, pol, include_cols = c( "class", "OBJ_ID"))<< I get the error message "In CPP_exact_extract(x, weights, wkb, default_value, default_weight, : Error getting geometry extent."
I already tried to use >>st_crs(pol) = 4326<< before transforming the CRS but it didn't work. It just shows the warning "st_crs<- : replacing crs does not reproject data; use st_transform for that ".
@Berkenkn This may caused by a lot of reasons. As a first guess try the following assuming your stack is called (rgbStack)
projection(rgbStack) <- 25832
if (!compareCRS(train_areas,rgbStack))
train_areas = sf::st_transform(train_areas, crs(rgbStack))
train_areas$OBJ_ID = 1:nrow(train_areas)
extr = exact_extract(rgbStack, train_areas, include_cols = c( "class","OBJ_ID"))
rawtrainDF = dplyr::bind_rows(extr)
But to tackle it down:
@gisma
Before trying your solution I redid the training polygons (digitizing) in QGIS and with the new GeoPackage Layer it works. So maybe I did a mistake while setting the properties of the GPKG or saving it, thanks anyway for your help.
Hello, whenever I run
model <- caret::train(predictors,
response,
method = "ranger",
trControl = ctrl,
tuneGrid = tgrid,
num.trees = 100,
importance = "permutation")
I get the Error message
Warning in for (a in args) { : closing unused connection 8 (<-KL:11631)
Warning in for (a in args) { : closing unused connection 7 (<-KL:11631)
Warning in for (a in args) { : closing unused connection 6 (<-KL:11631)
Warning in for (a in args) { : closing unused connection 5 (<-KL:11631)
Warning in for (a in args) { : closing unused connection 4 (<-KL:11631)
Warning in for (a in args) { : closing unused connection 3 (<-KL:11631)
Error in { : task 1 failed - "package ranger is required"
``
So I installed the ranger package and ran `ibrary (ranger)`. After that the above chunk works, but I get in trouble when I try to run `stopCluster(cl)` afterwards with the message
```r
Error in stopCluster(cl) : could not find function "stopCluster"
so I ran parallel::stopCluster(cl) and got the message
Error in summary.connection(connection) : invalid connection
And now I don't know what else to do..?
Okay, it seems like I solved the problem by cropping my dop_indices.RDS (I did not do this earlier).. but honestly I don't understand the connection..
@GeoKL Actually I do not exactly know what the problem has been. The error message according to the cluster is typically thrown if you start reserving CPU nodes for parallell processing but you miss to stop the parallel mode and vice versa. So you must always stop this parallel session after starting it with stopCluster(cl)
especeeially if an erro occurs. That means you should call the stopCluster(cl)
function manually. After the restarting of the rsession etc. this CPU reservation is deleted automatically.
Actually I do not know what you mean with cropping the dop_indices.rds
@gisma I tried to extract my training data, but the result was all observations were away. As you can see here: pol = sf::read_sf(paste0(envrmt$path_data,"/Training_Data.gpkg"))
View(pol) pol = sf::st_transform(pol, crs(rasterStack))
add IDs to your polygons, if they don't have them
pol$ID<-c(1:20)
to extract the data use raster::extract or the much faster package exactextractr
extr = exact_extract(rasterStack, pol, include_cols = c( "class", "ID")) |=====================================================================| 100% extr = extr[sapply(extr, is.data.frame)] extr = do.call(rbind, extr) saveRDS(extr, file.path(envrmt$path_modelling, "model_training_data/extraction.RDS")) extr = readRDS(file.path(envrmt$path_modelling, "model_training_data/extraction.RDS")) extr = na.omit(extr) buildings = extr[extr$class == "building",] other = extr[extr$class == "other",] nrow(buildings) [1] 0 What do you think is the reason for this?
@gisma I have found the reason for my trouble: In the extracted file of QGIS, the names "building" and "other" were written with capitals. So the script could not find them. I have changed this mistake and now it works.
If I´m trying to use the dop_indices.RDS file as Rasterstack I cannot crop the DOP or extract the rasterdata (
rextr = exact_extract(rasterStack, pol, include_cols = c( "class", "OBJ_ID"))
). I am getting the error messageFehler in .local(.Object, ...) :
. However, if I use the marburg_dop.tif file (so only teh 4 bands without the indices) both operations work well. Does anyone have an idea what causes the error?