EMODnet / EMODnet-Biology-Reef-Forming-Species-NorthSea

scripts for habitat suitability prediction of reef forming species in the North Sea
0 stars 0 forks source link

Issues on final product [EMODNETBIO-181] #1

Open salvafern opened 3 months ago

salvafern commented 3 months ago

Hi @wstolte,

Here some issues Im finding in the file with the final product. Writing here for documentation. Notice how the values of the variables present some aberrations.

e.g. I would not expect such high max and low min values of latitude, longitude or habitat suitability.

Longitude, latitude and aphiaid must be in ascendant order. Longitude must not have duplicated values.

The aphiaIDs in the file do not exist. The taxon_name and taxon_lsid seem corrupt.

Could you have a look? Maybe something went wrong when creating the file?

library(ncdf4)
library(dplyr)
is.sorted = Negate(is.unsorted)

nc <- nc_open("./product/foo.nc")
nc
#> File ./product/foo.nc (NC_FORMAT_CLASSIC):
#> 
#>      4 variables (excluding dimension variables):
#>         char taxon_name[string80,aphiaid]   
#>             standard_name: biological_taxon_name
#>             long_name: Scientific name of the taxa
#>         char taxon_lsid[string80,aphiaid]   
#>             standard_name: biological_taxon_lsid
#>             long_name: Life Science Identifier - World Register of Marine Species
#>         char crs[]   
#>             long_name: Coordinate Reference System
#>             geographic_crs_name: WGS 84
#>             grid_mapping_name: latitude_longitude
#>             reference_ellipsoid_name: WGS 84
#>             horizontal_datum_name: WGS 84
#>             prime_meridian_name: Greenwich
#>             longitude_of_prime_meridian: 0
#>             semi_major_axis: 6378137
#>             semi_minor_axis: 6356752.31424518
#>             inverse_flattening: 298.257223563
#>             spatial_ref: GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]
#>             GeoTransform: -180 0.08333333333333333 0 90 0 -0.08333333333333333 
#>         double habitat_suitability[lon,lat,aphiaid]   
#>             _FillValue: -99999
#>             long_name: Modelled habitat suitability for a species.
#> 
#>      4 dimensions:
#>         lon  Size:4361 
#>             units: degrees_east
#>             standard_name: longitude
#>             long_name: Longitude
#>         lat  Size:4209 
#>             units: degrees_north
#>             standard_name: latitude
#>             long_name: Latitude
#>         aphiaid  Size:4 
#>             long_name: Life Science Identifier - World Register of Marine Species
#>         string80  Size:80 (no dimvar)

# check lat/lon
lat <- ncvar_get(nc, "lat")
summary(lat)
#>        Min.     1st Qu.      Median        Mean     3rd Qu.        Max. 
#> -1.448e+307   5.300e+01   5.400e+01 -3.477e+303   5.600e+01  2.509e+288

is.sorted(lat)
#> [1] FALSE

any(duplicated(lat))
#> [1] FALSE

lon <- ncvar_get(nc, "lon")
summary(lon)
#>        Min.     1st Qu.      Median        Mean     3rd Qu.        Max. 
#>  -1.000e+05   0.000e+00   3.000e+00  1.308e+278   6.000e+00  5.705e+281

is.sorted(lon)
#> [1] FALSE

any(duplicated(lon))
#> [1] TRUE

# Check taxon
aphiaid <- ncvar_get(nc, "aphiaid")
is.sorted(aphiaid)
#> [1] FALSE

aphiaid
#> [1] -2073396244  1078560851   379889075  1078560798

ncvar_get(nc, "taxon_name")
#> [1] "\xa8\xde\xcf{@I\x87\xea;\030\xf9B@I\x87\xb5\xcdS#\n@I\x87\x81_\x8dL\xd1@I\x87L\xf1\xc7v\x98@I\x87\030\x84\001\xa0`@I\x86\xe4\026;\xca(@I\x86\xaf\xa8u\xf3\xef@I\x86{:\xb0\035\xb6@I\x86F\xcc\xeaG~@I\x86\022"
#> [2] "_$qE@I\x85\xdd\xf1^\x9b\r@I\x85\xa9\x83\x98\xc4\xd4@I\x85u\025\xd2\xee\x9c@I\x85@\xa8\r\030c@I\x85\f:GB+@I\x84\xd7́k\xf2@I\x84\xa3^\xbb\x95\xba@I\x84n\xf0\xf5\xbf\x81@I\x84:\x83/\xe9I@I\x84\006"            
#> [3] "\025j\023\020@I\x83ѧ\xa4<\xd8@I\x83\x9d9\xdef\x9f@I\x83h\xcc\030\x90f@I\x834^R\xba.@I\x82\xff\xf0\x8c\xe3\xf6@I\x82˂\xc7\r\xbd@I\x82\x97\025\0017\x84@I\x82b\xa7;aL@I\x82.9u\x8b\023@I\x81\xf9"              
#> [4] "˯\xb4\xdb@I\x81\xc5]\xe9ޢ@I\x81\x90\xf0$\bj@I\x81\\\x82^21@I\x81(\024\x98[\xf9@I\x80\xf3\xa6҅\xc0@I\x80\xbf9\f\xaf\x88@I\x80\x8a\xcbF\xd9O@I\x80V]\x81\003\027@I\x80!\xef\xbb,\xde@I\177\xed"

ncvar_get(nc, "taxon_lsid")
#> [1] "\x81\xf5V\xa6@I\177\xb9\024/\x80m@I\177\x84\xa6i\xaa5@I\177P8\xa3\xd3\xfc@I\177\033\xca\xdd\xfd\xc4@I~\xe7]\030'\x8b@I~\xb2\xefRQR@I~~\x81\x8c{\032@I~J\023Ƥ\xe1@I~\025\xa6"
#> [2] "8:\xf8p@I}\xac\xcau\"8@I}x\\\xafK\xff@I}C\xee\xe9u\xc7@I}\017\x81#\x9f\x8e@I|\xdb\023]\xc9V@I|\xa6\xa5\x97\xf3\035@I|r7\xd2\034\xe5@I|=\xca\fF\xac@I|\t\\Fpt@I{\xd4"        
#> [3] ";@I{\xa0\x80\xba\xc4\003@I{l\022\xf4\xed\xca@I{7\xa5/\027\x92@I{\0037iAY@Iz\xceɣk @Iz\x9a[ݔ\xe8@Ize\xee\027\xbe\xb0@Iz1\x80Q\xe8w@Iy\xfd\022\x8c\022>@Iy\xc8"              
#> [4] "\xa4\xc6<\006@Iy\x947"

# Check habitat suitability
habitat_suitability <- ncvar_get(nc, "habitat_suitability")
summary(habitat_suitability)
#>        Min.     1st Qu.      Median        Mean     3rd Qu.        Max. 
#> -2.682e+154   0.000e+00   0.000e+00 -1.519e+152   0.000e+00  2.682e+154

Created on 2024-08-09 with reprex v2.1.0

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.3 (2024-02-29) #> os Ubuntu 22.04.4 LTS #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Brussels #> date 2024-08-09 #> pandoc 3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.2) #> dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.2) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.2) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.2) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.2) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.2) #> ncdf4 * 1.22 2023-11-28 [1] CRAN (R 4.3.2) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.2) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.2) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.2) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.2) #> R.oo 1.26.0 2024-01-24 [1] CRAN (R 4.3.2) #> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.2) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.2) #> reprex 2.1.0 2024-01-11 [1] CRAN (R 4.3.2) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2) #> rmarkdown 2.26 2024-03-05 [1] CRAN (R 4.3.3) #> rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.3.3) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.2) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.2) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.2) #> tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.3) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.2) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.2) #> xfun 0.43 2024-03-25 [1] CRAN (R 4.3.3) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.2) #> #> [1] /home/localadmin/R/x86_64-pc-linux-gnu-library/4.3 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
salvafern commented 2 months ago

New iteration on 2024-08-27. Some of the issues seem solved, thanks @wstolte !

$ ncdump -h foo.nc 
netcdf foo {
dimensions:
    lon = 4361 ;
    lat = 4209 ;
    aphiaid = 4 ;
    string80 = 80 ;
variables:
    double lon(lon) ;
        lon:units = "degrees_east" ;
        lon:standard_name = "longitude" ;
        lon:long_name = "Longitude" ;
    double lat(lat) ;
        lat:units = "degrees_north" ;
        lat:standard_name = "latitude" ;
        lat:long_name = "Latitude" ;
    int aphiaid(aphiaid) ;
        aphiaid:long_name = "Life Science Identifier - World Register of Marine Species" ;
    char taxon_name(aphiaid, string80) ;
        taxon_name:standard_name = "biological_taxon_name" ;
        taxon_name:long_name = "Scientific name of the taxa" ;
    char taxon_lsid(aphiaid, string80) ;
        taxon_lsid:standard_name = "biological_taxon_lsid" ;
        taxon_lsid:long_name = "Life Science Identifier - World Register of Marine Species" ;
    char crs ;
        crs:long_name = "Coordinate Reference System" ;
        crs:geographic_crs_name = "WGS 84" ;
        crs:grid_mapping_name = "latitude_longitude" ;
        crs:reference_ellipsoid_name = "WGS 84" ;
        crs:horizontal_datum_name = "WGS 84" ;
        crs:prime_meridian_name = "Greenwich" ;
        crs:longitude_of_prime_meridian = 0. ;
        crs:semi_major_axis = 6378137. ;
        crs:semi_minor_axis = 6356752.31424518 ;
        crs:inverse_flattening = 298.257223563 ;
        crs:spatial_ref = "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Latitude\",NORTH],AXIS[\"Longitude\",EAST],AUTHORITY[\"EPSG\",\"4326\"]]" ;
        crs:GeoTransform = "-180 0.08333333333333333 0 90 0 -0.08333333333333333 " ;
    double habitat_suitability(aphiaid, lat, lon) ;
        habitat_suitability:_FillValue = -99999. ;
        habitat_suitability:long_name = "Modelled habitat suitability for a species." ;
}

Indeed I can see that the data is not correctly plot (this case in panoply)

habitat_suitability_in_foo

I have faced this before and fixed by changing the order of the data, not the latitude/longitude. Lat and Lon must be in ascendent order. You read the data as a matrix and via transpose t() or changing order apply(x, 1, rev) you can rotate the data. I have tried quickly but don't have time now to continue, but the solution must be something similar.

library(terra)

array <- terra::rast("foo.nc") |> 
  terra::as.array()

matrix <- array[, , 1] |>
  terra::as.matrix()

dim(matrix)

# Transpose dimensions
matrix <- t(matrix)
# dim(matrix)

# Reverse order of lon
matrix <- apply(matrix, 1, rev)
# dim(matrix)

# Traspose one more time
matrix <- t(matrix)

raster <- terra::rast(matrix)
crs(raster) <- "+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +ellps=WGS84"

plot(raster)

image

salvafern commented 2 months ago

One more version sent on 2024-08-27. Product looking good!

ncdump -h foo.nc 
netcdf foo {
dimensions:
    lon = 4361 ;
    lat = 4209 ;
    aphiaid = 4 ;
    string80 = 80 ;
variables:
    double lon(lon) ;
        lon:units = "degrees_east" ;
        lon:standard_name = "longitude" ;
        lon:long_name = "Longitude" ;
    double lat(lat) ;
        lat:units = "degrees_north" ;
        lat:standard_name = "latitude" ;
        lat:long_name = "Latitude" ;
    int aphiaid(aphiaid) ;
        aphiaid:long_name = "Life Science Identifier - World Register of Marine Species" ;
    char taxon_name(aphiaid, string80) ;
        taxon_name:standard_name = "biological_taxon_name" ;
        taxon_name:long_name = "Scientific name of the taxa" ;
    char taxon_lsid(aphiaid, string80) ;
        taxon_lsid:standard_name = "biological_taxon_lsid" ;
        taxon_lsid:long_name = "Life Science Identifier - World Register of Marine Species" ;
    char crs ;
        crs:long_name = "Coordinate Reference System" ;
        crs:geographic_crs_name = "WGS 84" ;
        crs:grid_mapping_name = "latitude_longitude" ;
        crs:reference_ellipsoid_name = "WGS 84" ;
        crs:horizontal_datum_name = "WGS 84" ;
        crs:prime_meridian_name = "Greenwich" ;
        crs:longitude_of_prime_meridian = 0. ;
        crs:semi_major_axis = 6378137. ;
        crs:semi_minor_axis = 6356752.31424518 ;
        crs:inverse_flattening = 298.257223563 ;
        crs:spatial_ref = "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Latitude\",NORTH],AXIS[\"Longitude\",EAST],AUTHORITY[\"EPSG\",\"4326\"]]" ;
        crs:GeoTransform = "-180 0.08333333333333333 0 90 0 -0.08333333333333333 " ;
    double habitat_suitability(aphiaid, lat, lon) ;
        habitat_suitability:_FillValue = -99999. ;
        habitat_suitability:long_name = "Modelled habitat suitability for a species." ;
}

habitat_suitability_in_foo

There are only some details to be tackled:



* Could you rename the long_name attribute of "habitat_suitability" to "Probability of occurrence of biological entity specified elsewhere per unit area of the bed"
* Could you rename the variable "habitat_suitability" to "probability_of_occurrence"?
* Could you add the attribute "units" = "level" to the variable "aphiaid"?
* Could you fill the global attributes?
wstolte commented 1 month ago

Thanks. Rethinking, I am not sure whether "probability_of_occurrence" is correct. It is not based on occurence alone, but it is using environmental information to find other areas (where they are not found at the moment) where they might be able to grow/live. Is it possible to stick to "habitat suitability"?

salvafern commented 1 month ago

Apologies for the belated answer. Yes you can stick to "habitat_suitability".

As I mentioned in the last call, we will have to come up with standardized names for the variables we create. We will have to define both terms. I will need your feedback (and from the other partners) then, but for now we can put a pin on it.