geoalchemy / geoalchemy2

Geospatial extension to SQLAlchemy
http://geoalchemy-2.readthedocs.org
MIT License
636 stars 112 forks source link

How to insert raster data from an array? #87

Closed mangecoeur closed 4 years ago

mangecoeur commented 9 years ago

I'm trying to use geoalchemy to store values in a numpy array as Raster data - I have the data needed to map my 2D array onto a geographical grid (its just wgs84 lat/long) - but the docs don't say anything about how to insert raw data into a Raster type. I tried simply supplying my numpy array as an argument to INSERT but got:

(ProgrammingError) can't adapt type 'numpy.ndarray'

Then I tried converting it to a python list of lists and got:

(ProgrammingError) column "temp" is of type raster but expression is of type numeric[]

How do I insert Raster data using geoalchemy? I don't have geotiff (or anything similar) as source files, just 3million-odd arrays to go in the DB which implies doing a round-trip via flat files is out.

mraspaud commented 9 years ago

I didn't try it myself, but what about converting the numpy array to a RasterElement first ? Like this:

import numpy as np
arr = np.random.rand(100).reshape(10, 10)
from geoalchemy2.elements import RasterElement
RasterElement(arr)

?

ghost commented 8 years ago

same question here, tried:

class My_Raster(Base):
    __tablename__ = 'my_rasters_table'
    rid = Column(Integer, primary_key=True)
    rast = Column(Raster)

new_raster_data = RasterElement(np.array([2,3,1,0]).reshape(2,2))
new_raster = My_Raster(rast=new_raster_data)
session.add(new_raster)
session.commit()

and got:

sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'numpy.ndarray' [SQL: 'INSERT INTO netcdf_data (rast) VALUES (%(raster_1)s::raster) RETURNING netcdf_data.rid'] [parameters: {'raster_1': array([[2, 3], [1, 0]])}]

also, just checked what i get when i extract a raster column from postgis and then print the type of the RasterElement data field and i get a string, which is the binary format of the raster as it turns out. so i guess i'm forced to work with that one then...

adrien-berchet commented 4 years ago

The RasterElement class handles rasters in WKB form, as returned by the Postgis function ST_AsBinary(). I think the best solution for this issue is to use the raster2pgsql script from Postgis. Feel free to reopen this issue if needed.

brandonwsims commented 2 years ago

The RasterElement class handles rasters in WKB form, as returned by the Postgis function ST_AsBinary(). I think the best solution for this issue is to use the raster2pgsql script from Postgis. Feel free to reopen this issue if needed.

@adrien-berchet

I know this is topic was closed a very long time ago, but I found myself stumbling upon it when trying to implement geoalchemy2 into a project of mine. After a bit of digging, turns out PostGIS originally had raster2pgsql as a Python script for a prototype of the raster2pgsql binary. This is documented here under section 11.1.1:

https://postgis.net/docs/using_raster_dataman.html

The documentation refers to this Python script, but going down that rabbit hole results in a broken link to the script that no longer exists. Fortunately, I was able to find it archived elsewhere:

https://github.com/greenplum-db/postgis/blob/master/raster/scripts/python/raster2pgsql.py

While I'm sure the various changes from PostGIS versions from then to now would break the SQL portions of that script, I don't think the raster to WKB export functionality has changed much. I'm looking into implementing the WKB export of that script into my current project. If the RasterElement class handles WKB, couldn't we technically export a raster as a GDAL Dataset from the Python GDAL bindings into WKB and load that into RasterElement for inserting into the database? This would get rid of having to rely on a subprocess call to get the job done. Sounds like a much cleaner solution.

Additionally, this might open up some functionality with ST_AsGDALRaster.

I'll report back if I have any success.

adrien-berchet commented 2 years ago

Hi @brandonwsims ! Thanks for this update. Have you seen the example in the gallery to decipher raster data into a numpy array? (here: https://geoalchemy-2.readthedocs.io/en/latest/gallery/test_decipher_raster.html) Starting from this I think it's not so hard to reverse the process.

brandonwsims commented 2 years ago

@adrien-berchet

Thanks for the link, I did not see that yet. I'll keep that bookmarked for my test. I've also found some other libraries which write Rasters to WKB, but those are unfortunately written in Rust.

https://github.com/fschutt/wkb-raster

Still helpful for trying to implement this in Python. I'll continue giving it a try.

adrien-berchet commented 2 years ago

@adrien-berchet

Thanks for the link, I did not see that yet. I'll keep that bookmarked for my test.

You're welcome :-)

I've also found some other libraries which write Rasters to WKB, but those are unfortunately written in Rust.

https://github.com/fschutt/wkb-raster

Still helpful for trying to implement this in Python. I'll continue giving it a try.

Ah interesting! Maybe you can just create a package that wraps this Rust library (or just ask for a Python wrapper in the library)? So it will be fast and it should not need much maintenance effort.

brandonwsims commented 2 years ago

Edit: I'm seeing now that this might be a better discussion under https://github.com/geoalchemy/geoalchemy2/issues/290

I think I might be on to a painfully easier solution, but I hit a bit of a block. Mostly because the WKB format isn't really well defined as a standard, seeing as it isn't one and only PostGIS uses it. I found a mail list from osgeo showing how to convert a GDAL raster in-memory (prefixed with /vsimem) and convert it to bytes. That's simple enough to do (and fast enough since it's using GDAL bindings), and the mail list gives an example.

https://lists.osgeo.org/pipermail/gdal-dev/2016-August/045030.html

f = gdal.VSIFOpenL('/vsimem/output.png', 'rb')
gdal.VSIFSeekL(f, 0, 2) # seek to end
size = gdal.VSIFTellL(f)
gdal.VSIFSeekL(f, 0, 0) # seek to beginning
data = gdal.VSIFReadL(1, size, f)
gdal.VSIFCloseL(f)

From here, you have a Python bytearray (data variable above) which I was able to load into a RasterElement instance. I'm not sure if this was actually handled correctly, but RasterElement didn't throw back any complaints. When trying to insert this RasterElement into a Raster column inside PostGIS however, I ran into this error:

Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\base.py", line 1819, in _execute_context
    self.dialect.do_execute(
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\default.py", line 732, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.InternalError_: rt_raster_from_wkb: WKB version 20048 unsupported
LINE 1: INSERT INTO images (pid, img) VALUES (1, raster('89504e470d0...
                                                        ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\brand\soil-simulator\tests\db.py", line 137, in <module>
    create_image_entry(RasterElement(img_bytes))
  File "C:\Users\brand\soil-simulator\tests\db.py", line 116, in create_image_entry
    session.commit()
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\orm\session.py", line 1451, in commit
    self._transaction.commit(_to_root=self.future)
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\orm\session.py", line 829, in commit
    self._prepare_impl()
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\orm\session.py", line 808, in _prepare_impl
    self.session.flush()
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\orm\session.py", line 3383, in flush
    self._flush(objects)
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\orm\session.py", line 3522, in _flush
    with util.safe_reraise():
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\util\langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\util\compat.py", line 208, in raise_
    raise exception
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\orm\session.py", line 3483, in _flush
    flush_context.execute()
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\orm\unitofwork.py", line 456, in execute
    rec.execute(self)
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\orm\unitofwork.py", line 630, in execute
    util.preloaded.orm_persistence.save_obj(
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\orm\persistence.py", line 245, in save_obj
    _emit_insert_statements(
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\orm\persistence.py", line 1238, in _emit_insert_statements
    result = connection._execute_20(
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\base.py", line 1631, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\sql\elements.py", line 332, in _execute_on_connection
    return connection._execute_clauseelement(
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\base.py", line 1498, in _execute_clauseelement
    ret = self._execute_context(
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\base.py", line 1862, in _execute_context
    self._handle_dbapi_exception(
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\base.py", line 2043, in _handle_dbapi_exception
    util.raise_(
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\util\compat.py", line 208, in raise_
    raise exception
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\base.py", line 1819, in _execute_context
    self.dialect.do_execute(
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\default.py", line 732, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.InternalError: (psycopg2.errors.InternalError_) rt_raster_from_wkb: WKB version 20048 unsupported
LINE 1: INSERT INTO images (pid, img) VALUES (1, raster('89504e470d0...
                                                        ^

[SQL: INSERT INTO images (pid, img) VALUES (%(pid)s, raster(%(img)s)) RETURNING images.iid]
[parameters: {'pid': 1, 'img': '89504e470d0a1a0a0000000d49484452000000a1000000a00806000000640d0c130000200049444154789cacbd4b962439b225760580aa99b9477e5ebd9a71c00117c005901be026b80e7 ... (64746 characters truncated) ... 86d9dc8f8bef1fcfdc4f579e1fabedcfb9907ec67c3716c960ffb59e698342105046cf35705ef15b457b0c382fffbf5bf5fffdfbffe1ffefae8e0902db5800000000049454e44ae426082'}]
(Background on this error at: https://sqlalche.me/e/14/2j85)

Process finished with exit code 1

Of note in this error is the line reading - WKB version 20048 unsupported. A Google search of WKB version 20048 doesn't return back anything of value. Any ideas?

adrien-berchet commented 2 years ago

I think gdal.VSIFReadL(1, size, f) creates a GDAL dataset, not a WKB.

If you want something easy, maybe you can just try to export your array into a PNG file and load it with ST_FromGDALRaster (see the example here: https://postgis.net/docs/manual-3.2/RT_ST_FromGDALRaster.html)? But I am not sure I understood properly: is your input data a Numpy array of a PNG file? Or something else?

brandonwsims commented 2 years ago

So my application is working in-memory with GDAL. I do my analysis, conversion, etc all in-memory without touching the file system to avoid IO times. Because of that, I'm already using the /vsmimem directive to load rasters in-memory and mimic a file system for other methods that require it as a GDAL Dataset. The code I shared above allows me to export a GDAL Dataset in-memory to a WKB bytearray to play nicely with the raster column in PostGIS. This also gets around having to use the file system and a subprocess call to raster2pgsql without ever leaving my Python script.

I'm trying to avoid reading/writing to files completely to avoid the unnecessary overhead. My idea was to create my raster through GDAL in-memory, perform whatever analysis or conversions I need to on it, convert it to a bytearray without writing the raster to file first (or calling raster2pgsql), and upload it to PostGIS to be used later. The block I hit was the WKB version 20048 is unsupported message I get when trying to insert a raster as a WKB bytearray into a raster column when wrapping the WKB in a RasterElement type.

Here's a SO question that shows someone trying to do more or less what I am.

https://gis.stackexchange.com/questions/94983/import-raster-into-postgis-using-python?rq=1

I think you were right in suggesting that I needed to use ST_FromGDALRaster first though. Is there a clean way to do that within geoalchemy? Or would I need to resort to using psycopg instead and write out the SQL statement

adrien-berchet commented 2 years ago

Ok I see.

To use ST_FromGDALRaster at insert I think you can just write something like:

your_table.insert().values(raster_col_name=func.ST_FromGDALRaster('your_data'))

And if you want to use the ORM you can refer to this example in the gallery: https://geoalchemy-2.readthedocs.io/en/latest/gallery/test_type_decorator.html

And also, I don't know if you saw this but maybe you can check your WKB data is properly formatted according to this: http://trac.osgeo.org/postgis/browser/trunk/raster/doc/RFC2-WellKnownBinaryFormat

brandonwsims commented 2 years ago

Thank you so much!

adrien-berchet commented 2 years ago

You're welcome :) Did you find a complete solution for your problem? If you think your solution can help other people, don't hesitate to open a PR to add an example in the gallery. Or if you don't have time for this you can just paste a snippet here and I will create a complete example from it.

brandonwsims commented 2 years ago

Still working on putting it all together, but I'll absolutely do a PR once I finish!

adrien-berchet commented 2 years ago

Ok cool, thanks! Don't hesitate if you need anything else to make it work.

brandonwsims commented 2 years ago

Ended up not being able to get around my issue with the WKB format being unsupported. Even tried the solution proposed in #290 with no luck. For the time being, I just ended up using raster2pgsql like everyone else. I really don't like this approach as I don't want to write a raster to disk first, but I suppose it works until then. I really wish GDAL had support for writing to WKB just as it does for loading a raster from PostGIS directly by a supplied connection string and SQL query.

I'm pretty sure others have already arrived at this solution, but here's a stripped down version of what I'm doing. This only shows the relevant portion of my code. I took out everything else.

from geoalchemy2 import Raster
from sqlalchemy import create_engine
from sqlalchemy import Column
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from subprocess import PIPE, Popen

Base = declarative_base()

engine = create_engine("<connection_string>")
session = sessionmaker(bind=engine)()

class Images(Base):
    __tablename__ = "images"
    iid = Column(Integer, primary_key=True, autoincrement=True)
    img = Column(Raster())

def raster2pgsql(rast):
    # Convert raster to wkb
    p = Popen(["raster2pgsql", rast], stdout=PIPE)
    sql = str(p.communicate()[0])

    # Extract WKB from raster2pgsql output
    start = sql.find("VALUES (\\'") + len("VALUES (\\'")
    end = sql.rfind("\\'::raster")

    # Return WKB
    return sql[start:end]

def create_image_entry(img):
    image = Images(img=img)
    session.add(image)
    session.commit()

if __name__ == '__main__':
    img = raster2pgsql('<path_to_raster_on_disk>')
    create_image_entry(img)

On my hardware, this entire process takes about .02s for a 32KB PNG, but that's not including file IO time for writing to disk first. Not a great solution, but I suppose I'll revisit this later whenever I get the time.

As for how I'm loading it directly into GDAL after:

from osgeo import gdal

GDAL_CONN = "PG:host=<host> port=5432 dbname=<name> user=<usr> password=<pwd>"
GDAL_PG = "{} schema='{}' table={} column={} where='{}'"

def pull_image(iid):
    ds = gdal.Open(
        GDAL_PG.format(GDAL_CONN, "public", "images", "img", f"iid={iid}")
    )
    gdal.Translate(f"{iid}.png", ds)

Once again, it baffles me that GDAL has support in GDAL.Open for reading WKB, but there's no support for writing a raster to WKB. Only vectors can do this in GDAL.

adrien-berchet commented 2 years ago

Thanks for the snippets! Indeed, it's a bit surprising that GDAL does not provide any clean way to do this.

I searched a bit more a found this: https://github.com/geodesign/django/blob/gdalraster/django/contrib/gis/gdal/raster/rasters.py#L466 As far as I can see it just performs the opposite process of what is shown in the gallery (https://geoalchemy-2.readthedocs.io/en/latest/gallery/test_decipher_raster.html). Maybe one day it could be interesting to include a similar function in GeoAlchemy2, I will think about that (but atm I can't see when I will have time to do it).

Anyway, thank you very much for your work on this!