GeoMOER / geoAI

Module on geospatial data prediction and remote sensing of the Honour Degree Programm „AI und Entrepreneurship"
MIT License
0 stars 0 forks source link

GeoAI_2021_unit_03_EX_A_randomly_good_model #4

Open Baldl opened 2 years ago

katharinakiem commented 2 years ago

If I´m trying to use the dop_indices.RDS file as Rasterstack I cannot crop the DOP or extract the rasterdata (rextr = exact_extract(rasterStack, pol, include_cols = c( "class", "OBJ_ID"))). I am getting the error message Fehler in .local(.Object, ...) :. However, if I use the marburg_dop.tif file (so only teh 4 bands without the indices) both operations work well. Does anyone have an idea what causes the error?

gisma commented 2 years ago

actually it would be of interest to know whats the rest of the error message - if exist.

FloFranz commented 2 years ago

Does anyone else have a problem with the accuracy? My confusion matrix reveals an accuracy of 99%. A look at the prediction map shows that this cannot be the case.

My Code:

test_data = readRDS(file.path(envrmt$path_model_training_data, "extr_test_extent_dop.RDS"))
model = readRDS(file.path(envrmt$path_models, "model_extent_dop.RDS"))

predicted = stats::predict(object = model, newdata = test_data)

val_df = data.frame(ID = dplyr::pull(test_data, "OBJ_ID"),
                    Observed = dplyr::pull(test_data, "class"), 
                    Predicted = predicted)

val_cm = confusionMatrix(table(val_df[,2:3]))
gisma commented 2 years ago

@FloFranz well observed and and i am firmly convinced that this problem occurs very often without perhaps being perceived as a problem. If you google it you will find tons of controvers explanations to this question. If you remember, it also came up in the exercise in the previous unit. In fact, this is somewhat the essence of the title of the unit "randomly correct is not predicted"although it might be better to say "randomly correct and of high performance is not necessarily a good spatial prediction".

In particular, the contingency table shown, shows how high accuracy values can occur. In general, it is a problem of variable selection, overtraining and spatial autocorrelation. The overfitted model reproduces the training data well because it has learned it by heart, but this does not necessarily mean that the prediction is valid. More in-depth and detailed explanations are available in the further readings section.

GeoKL commented 2 years ago

When I run "pol = sf::st_transform(pol, crs(rasterStack))" the lists in the "geom" column in my pol file are empty afterwards. Does anybody know, why this is happening?

gisma commented 2 years ago

EDIT to a more common answer. @GeoKL a probably reason that your data set is providing a crs in the WKT2 crs convention. Like on osm data set buildings. The crs of WKT2 crs objects like is by default something like:

 User input: EPSG:4326 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

Using the command st_crs(pol) = 4326 should reset this to

+proj=longlat +datum=WGS84 +no_defs 

After doing so the st_transfom command should work.

GeoKL commented 2 years ago

The buildings file is my training data (I chose the name buildings for it). I don't knwo what you mean my osm data set?

GeoKL commented 2 years ago

anyway, st_crs(buildings) = 4326 dos not solve the problem

gisma commented 2 years ago

I was confused because of our conversation in the course chat. you were working on the poldata set so you need to run st_crs(pol) = 4326

GeoKL commented 2 years ago

I did

gisma commented 2 years ago

@GeoKL what I just tried to clarify is that thecurrent and future format of CRS is not PROJ4 but WKT2. The geopackage as well as the OSM data from Lesson 1 are using by default WKT2. This is not compatible and one otion is to re-define ist by st_crs(x) = ESPG-NUMBER

Berkenkn commented 2 years ago

When I try to extract the data by

extr = exact_extract(rasterStack, pol, include_cols = c( "class", "OBJ_ID"))<< I get the error message "In CPP_exact_extract(x, weights, wkb, default_value, default_weight, : Error getting geometry extent."

I already tried to use >>st_crs(pol) = 4326<< before transforming the CRS but it didn't work. It just shows the warning "st_crs<- : replacing crs does not reproject data; use st_transform for that ".

gisma commented 2 years ago

@Berkenkn This may caused by a lot of reasons. As a first guess try the following assuming your stack is called (rgbStack)

projection(rgbStack) <- 25832
if (!compareCRS(train_areas,rgbStack))
   train_areas = sf::st_transform(train_areas, crs(rgbStack))

train_areas$OBJ_ID = 1:nrow(train_areas)
extr = exact_extract(rgbStack, train_areas, include_cols = c( "class","OBJ_ID"))
rawtrainDF = dplyr::bind_rows(extr) 

But to tackle it down:

Berkenkn commented 2 years ago

@gisma

Before trying your solution I redid the training polygons (digitizing) in QGIS and with the new GeoPackage Layer it works. So maybe I did a mistake while setting the properties of the GPKG or saving it, thanks anyway for your help.

GeoKL commented 2 years ago

Hello, whenever I run

model <- caret::train(predictors,
               response,
               method = "ranger",
               trControl = ctrl,
               tuneGrid = tgrid,
               num.trees = 100,
               importance = "permutation")
I get the Error message
Warning in for (a in args) { : closing unused connection 8 (<-KL:11631)
Warning in for (a in args) { : closing unused connection 7 (<-KL:11631)
Warning in for (a in args) { : closing unused connection 6 (<-KL:11631)
Warning in for (a in args) { : closing unused connection 5 (<-KL:11631)
Warning in for (a in args) { : closing unused connection 4 (<-KL:11631)
Warning in for (a in args) { : closing unused connection 3 (<-KL:11631)
Error in { : task 1 failed - "package ranger is required"
``

So I installed the ranger package and ran `ibrary (ranger)`. After that the above chunk works, but I get in trouble when I try to run `stopCluster(cl)` afterwards with the message
```r
Error in stopCluster(cl) : could not find function "stopCluster"
so I ran parallel::stopCluster(cl) and got the message 
Error in summary.connection(connection) : invalid connection

And now I don't know what else to do..?

GeoKL commented 2 years ago

Okay, it seems like I solved the problem by cropping my dop_indices.RDS (I did not do this earlier).. but honestly I don't understand the connection..

gisma commented 2 years ago

@GeoKL Actually I do not exactly know what the problem has been. The error message according to the cluster is typically thrown if you start reserving CPU nodes for parallell processing but you miss to stop the parallel mode and vice versa. So you must always stop this parallel session after starting it with stopCluster(cl) especeeially if an erro occurs. That means you should call the stopCluster(cl) function manually. After the restarting of the rsession etc. this CPU reservation is deleted automatically.
Actually I do not know what you mean with cropping the dop_indices.rds

Muenchj4 commented 2 years ago

@gisma I tried to extract my training data, but the result was all observations were away. As you can see here: pol = sf::read_sf(paste0(envrmt$path_data,"/Training_Data.gpkg"))

View(pol) pol = sf::st_transform(pol, crs(rasterStack))

add IDs to your polygons, if they don't have them

pol$ID<-c(1:20)

to extract the data use raster::extract or the much faster package exactextractr

extr = exact_extract(rasterStack, pol, include_cols = c( "class", "ID")) |=====================================================================| 100% extr = extr[sapply(extr, is.data.frame)] extr = do.call(rbind, extr) saveRDS(extr, file.path(envrmt$path_modelling, "model_training_data/extraction.RDS")) extr = readRDS(file.path(envrmt$path_modelling, "model_training_data/extraction.RDS")) extr = na.omit(extr) buildings = extr[extr$class == "building",] other = extr[extr$class == "other",] nrow(buildings) [1] 0 What do you think is the reason for this?

Muenchj4 commented 2 years ago

@gisma I have found the reason for my trouble: In the extracted file of QGIS, the names "building" and "other" were written with capitals. So the script could not find them. I have changed this mistake and now it works.