Cidree / rpostgis

rpostgis: R Interface to a 'PostGIS' Database
http://cidree.github.io/rpostgis/
77 stars 14 forks source link

pgGetRast() SSL SYSCALL error: EOF detected #9

Closed balazsdukai closed 4 years ago

balazsdukai commented 7 years ago

I have a raster from here: ftp://ftp.fgdl.org/pub/state/usgsdem.zip

After converting to geotiff I loaded to PostGIS as raster2pgsql -s 4269 -I -t auto -M ./usgsdem/florida_dem.tif | psql -d rpostgisLT

When trying to get it into R from PostGIS with pgGetRast(conn, c("public", "florida_dem")), I get

Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : SSL SYSCALL error: EOF detected
)

Warning message:
In postgresqlQuickSQL(conn, statement, ...) :
  Could not create execute: select 
              st_xmax(st_envelope(rast)) as xmx,
              st_xmin(st_envelope(rast)) as xmn,
              st_ymax(st_envelope(rast)) as ymx,
              st_ymin(st_envelope(rast)) as ymn,
              st_width(rast) as cols,
              st_height(rast) as rows
              from
              (select st_union("rast",1) rast from "public"."florida_dem") as a;

plus a huge memory leak.

However, this works:

dsn="PG:dbname='rpostgisLT' host=localhost user='user' password='pw' port=5432 schema='public' table='florida_dem' mode=2"
ras <- readGDAL(dsn) # Get your file as SpatialGridDataFrame
ras2 <- raster(ras,1)
basille commented 7 years ago

Did you try to load the raster file directly in R using raster? Then load it to the DB using pgWriteRast, and then back in R using pgGetRast?

As you know, readGDAL loads the raster as a SpatialGridDataFrame, so the approach using raster (which pgGetRast is using) should be much lighter on memory…

balazsdukai commented 7 years ago

No I didn't try it, because I don't see why would I load the raster first to R and then to the DB if its possible to load it directly to the DB?

basille commented 7 years ago

Just as a way to isolate the source of the problem (file, R, raster, …), and see at which step it arises.

basille commented 7 years ago

I do see the memory leak, but I don't have the error. Steps I tried:

raster2pgsql -s 4269 -I -t auto -M hdr.adf | psql -d test

Then import in R:

library("rpostgis")
(rast1 <- pgGetRast(conn, "hdr"))
class       : RasterLayer 
dimensions  : 8053, 8237, 66332561  (nrow, ncol, ncell)
resolution  : 90, 90  (x, y)
extent      : 52621.16, 793951.2, 56869.21, 781639.2  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0 
data source : in memory
names       : layer 
values      : 0, 114  (min, max)

In this case, there is indeed a huge consumption of memory by PostgreSQL (up to nearly 4GB, but I did not monitor it constantly, could have been even more).

library("raster")
rast2 <- raster("hdr.adf")
pgWriteRast(conn, "rast_test", rast2)    # Note that the function uses tiles of 103x101 pixels
(rast2 <- pgGetRast(conn, "rast_test"))
class       : RasterLayer 
dimensions  : 8053, 8237, 66332561  (nrow, ncol, ncell)
resolution  : 90, 90  (x, y)
extent      : 52621.16, 793951.2, 56869.21, 781639.2  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=aea +lat_1=24 +lat_2=31.5 +lat_0=24 +lon_0=-84 +x_0=400000 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 
data source : in memory
names       : hdr 
values      : 1, 114  (min, max)

I still have a high memory usage on importing back in R, but "only" up to 2.5 GB maybe.

In the end, R uses 2.5 GB, after loading one or two of the rasters.

Note that following these steps, the rasters seem identical in R, but don't have the same projections.

dnbucklin commented 7 years ago

I didn't get the error message (when importing into R from rasters sent to the database via both R and psql), but I do have slow execution and a big burden on the memory as well in rpostgis. This isn't surprising, since the entire raster is being processed through memory, both ways. @balazsdukai, I'm guessing that's what your PostgreSQL error is, since that union statement would be the biggest memory hog. Did you try just running that statement in Postgres?

Because of the memory limitation (and/or the need for better methods!) these raster functions are not great for large rasters; I think it's best to always utilize the boundary parameter when using pgGetRast, especially large rasters.

@basille, The mismatch in projection I think is due to the use of "-s 4269" (NAD83) in your psql import statement, which isn't the original projection of the raster (it's in an equal area projection). Balazs must have re-projected to 4269 GeoTiff prior to importing.

balazsdukai commented 6 years ago

@dnbucklin hm what do you mean by "just running that statement in Postgres"?

basille commented 4 years ago

No activity for about two years on this issue, I'm closing it — feel free to reopen if you want to add more information to the issue.