3liz / qgis-pgmetadata-plugin

QGIS Plugin to manage some metadata from PostgreSQL layer
GNU General Public License v2.0
12 stars 10 forks source link

Cant add metadata because ERROR: array size exceeds the maximum allowed (1073741823) #126

Closed zackarno closed 1 year ago

zackarno commented 2 years ago

layers in my postgres DB don't seem to be able to added to my metadata db due to the error:

  Provider errors:
      PostGIS error while adding features: ERROR:  array size exceeds the maximum allowed (1073741823)
    CONTEXT:  SQL statement "
                SELECT ST_Transform(ST_ConvexHull(st_collect(ST_Force2d("geom"))), 4326)
                FROM col.col_lc_2018"
    PL/pgSQL function pgmetadata.calculate_fields_from_data() line 63 at EXECUTE

For my purposes, storing the metadata is more important than having the convex hull representation. Is there a way to get around this?

effjot commented 2 years ago

As a quick workaround, you could change ST_ConvexHull to ST_Envelope in calculate_fields_from_data(). This would calculate the simple bounding box and shouldn’t be affected by your large layer. You can edit it for example with PgAdmin: go to your database → Schema pgmetadata → Trigger Functions → Properties → Code

@Gustry As a long-term solution, maybe you could add a configuration setting on which function to use for the layer outline?

mdouchin commented 2 years ago

I also think we should keep it simple and only use the bounding box with ST_Envelope: the convex hull is probably a bit too much.

effjot commented 2 years ago

I don‘t mind the convex hull and haven’t run into problems yet, even with state-sized layers with a lot if tiny scraggy polygons. (I did realise convex hull doesn’t work well for rasters, however. But for those, a bounding box is almost always good enough.)

Maybe the trigger function could look at the number of features/geometries in the and decide whether to use convex hull or bounding box.

zackarno commented 2 years ago

Thank you very much @effjot for the workaround suggestion.

After playing around with Trigger function it seems that changing ST_ConvexHull to ST_Envelope still produces the same error. However, removing ST_Collect allows the function to run with either envelope or convex hull. I am not sure what the unintended consequences of doing this might be though?

A condition based on the number of geoms seems like a nice option to me

effjot commented 2 years ago

If I understand https://postgis.net/docs/ST_ConvexHull.html correctly, leaving out ST_Collect will give you convex hulls for each feature in your layer instead of one convex hull encompassing all features.

zackarno commented 2 years ago

Thanks @effjot - convex hulls for each feature per layer does not seem desirable. Therefore, do you have any suggestion/ideas for a workaround to remove ST_Collect which seems like the problematic function on layers with high numbers of geometries?

effjot commented 2 years ago

Maybe Box2d() would work (https://postgis.net/docs/Box2D.html).

If it doesn’t, you could try the min/max approach I had used for raster support:

    -- use extent (of whole table) from raster_columns catalog as envelope
    -- (union of convexhull of all rasters (tiles) in target table is too slow for big tables)
    EXECUTE 'SELECT ST_Transform($1, 4326)'
    INTO geom_envelop
    USING test_rast_column.extent;

The view raster_columns already has the extent column, but geometry_columns doesn’t. I think you could base a similar approach for vector layers on how NEW.extent is determined. That yields a text representation, however, so you need to construct a rectangle/polygon from the min/max values instead. I didn’t test that, though.

effjot commented 2 years ago

Could you solve the problem with this approach? @zackarno

zackarno commented 2 years ago

@effjot - it sounds like think that approach would work. However, I'm really not familiar with how objects are stored and passed on in SQL/psql so I am not sure how to fit this into the trigger function.

I think i understand the general geoprocessing sequence and can fit it into an object-oriented language framework, but not so well in sql/psql. I've outlined a rough approach using a hybrid of sql and a more R/python like paradigm (which obviously doesn't make complete sense) that I think would work if I only understood how to store and pass the objects in the context of this trigger function (apologies for my ignorance regarding sql/psql functions)

srid = FIND_SRID(schema, table, geometry)
num_records = select count(*) from schema.table
if(num_records>x){
extent = select st_setsrid(ST_Extent(ST_Force2d("geom")), srid)
ST_Transform(extent, 4326)
}
if(num_records<=x){
execute code as is
}
mdouchin commented 2 years ago

I will have a look before my PgMetadata presentation in Firenze ;) and probably release a new version (with the awesome raster support). After some thinking, I think we should keep it safe and simple and always use a very lightweight method: ST_Rectangle (or even PostgreSQL statistics). You cannot trust the number of features as a reflection of complexity (imagine a table with 5 features of the continents with a high number of nodes per polygon)

effjot commented 1 year ago

Meanwhile, I have encountered the problem, too. My solution was to use ST_Extent() for large tables instead of ST_Convexhull() and ST_Collect() (which I think is the real culprit here).

        -- geom: convexhull from target table
        -- Use extent (bounding box) as envelope; ST_SetSRID to convert Box2D to geometry for very big tables (ST_Collect or ST_ConvexHull cause “array size exceeds maximum” error).
        -- Limit based on: Corine Land Cover Germany 25ha resolution (169415 features) works, but 5ha resolution (658707 features) didn’t.
        if NEW.feature_count < 500000 then
            EXECUTE '
                SELECT ST_Transform(ST_ConvexHull(ST_Collect(ST_Force2d("' || geom_column_name || '"))), 4326)
                FROM ' || target_table
            INTO geom_envelop;
        else
            EXECUTE '
                SELECT ST_Transform(ST_SetSRID(ST_Extent("' || geom_column_name || '"), $1), 4326)
                FROM ' || target_table
            INTO geom_envelop
            USING test_geom_column.srid;
        end if;

@mdouchin, @Gustry Should I prepare a PR for this solution?

mdouchin commented 1 year ago

@mdouchin, @Gustry Should I prepare a PR for this solution?

+1 for me. Thanks a lot !