RDCEP / EDE

MIT License
2 stars 1 forks source link

ST_PixelAsCentroids #20

Closed njmattes closed 8 years ago

njmattes commented 8 years ago

@ricardobarroslourenco @legendOfZelda: Is it possible to make ST_PixelAsCentroids return all bands from a raster? I've been playing with something like the following to return info from the database in a format that can be converted to JSON in the python middleware, so that's useful for drawing the GUI.

CREATE OR REPLACE FUNCTION get_raster(mid int, vid int, band int)
RETURNS TABLE(val double precision, st_astext text) AS $$
BEGIN
  RETURN QUERY
  SELECT r.val, ST_AsText(r.geom)
  FROM (SELECT (ST_PixelAsCentroids(rast, band)).*
        FROM grid_data AS gd
        WHERE gd.meta_id = mid
              AND gd.var_id = vid
       ) AS r;
END;
$$ LANGUAGE plpgsql;

This seems to work alright, but only gives a single band. It's possible to draw a single band, and then reissue a request (or multiple requests) for the rest of the bands.

Any thoughts about the best (and also reasonably fast) way to retrieve this? Bearing in mind that, as far as the GUI is concerned, we don't need to return the globe—only the pixels that fill the current viewport.

ricardobarroslourenco commented 8 years ago

@njmattes is this table grid_data stored at the current database ede? I was trying to look at it. Perhaps we could iterate over the bands. Or dump, if it is just a column...

njmattes commented 8 years ago

@ricardobarroslourenco Looks like the database on the server is a couple steps behind the develop branch. I've been testing locally today (was at a kids chess tourney all day). The netcdf_data table on the server has a similar layout though.

I think the bands are tucked away inside the binary rast column (by tile). Which is what's making things a bit difficult. I think we're trying to stick to the procedures in PostGIS right now, to retrieve the info from the rasters. But ST_PixelAsCentroids doesn't actually return which band you requested. So you can't return all the bands this way, and then sniff out which ones you want.

Is it necessary to write some lower-level functionality to create a function that returns the rasters as eg EDE_PixelAsCentroids with an integer column representing which band each centroid / value pair belongs to?

ghost commented 8 years ago

let's see...first, i'll bring the ede db to the layout state right now, so don't use it right now. i need you to leave your db session(s) for that

njmattes commented 8 years ago

Sorry about that—my sessions should be gone now.

ricardobarroslourenco commented 8 years ago

Yeah. Me either (I've just dropped it)

ghost commented 8 years ago

we've basically got these options:

more precisely:

i suggest we use the last one to avoid looping. of course we could wrap up the loop in a user-defined function but it's not fully clear if we can return that as a single table at the end + even if we can it's cumbersome + not as performant as the last option using ST_DumpValues i strongly think.

speaking now of this last option, we need to know what the conventions of the output of the two queries are...

for the output {{...},{...},{...},...} of SELECT (ST_DumpValues(rast)).* FROM grid_data where meta_id=9 and var_id=2; the convention is that it shows the values row-by row, i.e. {{values of row1}, {values of row2}, {values of row3}, ...}

for the output

 x  | y  |      st_astext       
----+----+----------------------
  7 |  4 | POINT(8.25 63.25)
  8 |  4 | POINT(8.75 63.25)
...

of SELECT x, y, ST_AsText(geom) FROM (SELECT (ST_PixelAsCentroids(rast)).* FROM grid_data where meta_id=9 and var_id=2) centroids; the convention is that x = column and y = row (not the 'more intuitive' x = row and y = column)

understanding these conventions allows us to connect the values to the right (lat,lon)'s.

ricardobarroslourenco commented 8 years ago

@legendOfZelda nice. I was thinking on this last option (without specifying a band) when posted before.

ghost commented 8 years ago

yes, that's what i thought you meant, cool :+1: btw, the ede db now has the most current schema.

njmattes commented 8 years ago

Some more questions:

ricardobarroslourenco commented 8 years ago

The 10x10 tiles I think it is something related to an OGC compliance standard, no? When talking about the rotation order, this always comes on mind.

About the raster type, I think this goes into that discussion of data regularization that Raffaelle brought at the all hands meeting. I think this will need to be done prior to ingestion, and looking into some things such as the downscale methods that Joshua uses.

(Sorry by the weird answer, I'm typing at my phone)

ghost commented 8 years ago

just to get a more clear understanding of your questions above @njmattes:

it took me a while to understand some of your questions above. the problem we're dealing with here (and which i didn't realize first) is that the ST_DumpValues command above returns x, y indexes multiple times (once for each tile), i.e. the logical coordinates x, y are not global for the entire grid but w.r.t. individual tiles, i.e. local logical coordinates, that's why they appear multiple times.

similarly for the ST_PIxelAsCentroids command above where the arrays returned are 10x10, i.e. just tiles, not entire bands.

i am going to answer your questions above (with more appropriate PostGIS queries) but would be glad if you can clarify my two questions here first.

ghost commented 8 years ago

@njmattes @ricardobarroslourenco why would we want to have all timeframes of a raster inside python? why wouldn't we instead write out the raster with all its timeframes as a GeoTIFF as described here: How to export raster from PostGIS enabled DB as GTiff with gdal_translate (it works, just tested it, see also 9.10. Raster Outputs in the PostGIS raster reference for some output functions we probably overlooked a little) and then send that as an ordinary file to the user?

why would we ever want to hold all timeframes, which can be quite large, inside python memory?

PS: i didn't realize until now that ST_PixelAsCentroids alone already does the job for a single timeframe. e.g. if we issue (in the current database)

select ST_AsText(geom), val from (select (ST_PixelAsCentroids(rast)).* from grid_data where meta_id=1 and var_id=1) centroids;

or

select ST_AsGeoJSON(geom), val from (select (ST_PixelAsCentroids(rast)).* from grid_data where meta_id=1 and var_id=1) centroids;

(ignore the NOTICE and CONTEXT messages and wait till the query has finished). we have what we need for the frontend, don't we? again, this is for a specific timeframe. if you want all timeframes that's still going to be trickier and i'm figuring it out. i'm just wondering whether we need all timeframes inside python memory in the first place instead of doing the above GeoTIFF approach.

ghost commented 8 years ago

this problem is now solved, see query (5) in #23