Closed 0e1 closed 2 years ago
Resumo da ingestão de via e geoaddress, usando a flag:
From file_id=5 inserted type=via_full. +
+
Statistics: +
. +
Before deduplication: +
+
Originals: 4856390 items. +
+
Not Intersecs: 55559 items. +
+
Invalid: 0 items. +
+
Not simple: 1834 items. +
+
Empty: 1671 items. +
+
Small: 3140 items. +
+
Null: 0 items. +
+
Invalid geometry type: 0 items. +
+
Not closed: 0 items. +
+
Inserted in feature_asis: 4795887 items. +
+
Inserted in feature_asis_discarded: 60503 items. +
+
+
After deduplication: +
+
Removed duplicates from feature_asis: 499737 items. +
+
Inserted in feature_asis_discarded (duplicates): 499737 items. +
+
Inserted in feature_asis (aggregated duplicates): 247510 items.+
+
Resulting in feature_asis: 4543660 +
------------------------------------------------------------------------
From file_id=6 inserted type=geoaddress_full. +
+
Statistics: +
. +
Before deduplication: +
+
Originals: 1101237 items. +
+
Not Intersecs: 91705 items. +
+
Invalid: 0 items. +
+
Not simple: 0 items. +
+
Empty: 0 items. +
+
Small: 0 items. +
+
Null: 0 items. +
+
Invalid geometry type: 0 items. +
+
Not closed: 0 items. +
+
Inserted in feature_asis: 1009532 items. +
+
Inserted in feature_asis_discarded: 91705 items. +
+
+
After deduplication: +
+
Removed duplicates from feature_asis: 23696 items. +
+
Inserted in feature_asis_discarded (duplicates): 23696 items. +
+
Inserted in feature_asis (aggregated duplicates): 11662 items.+
+
Resulting in feature_asis: 997498 +
A publicação dos dados de geoaddress ocorreu sem problemas, como pode ser vista em https://github.com/digital-guard/preservCutGeo-BR2021/tree/main/data/_pk0004.03/geoaddress.
No entanto, a publicação de via, retornou:
--- Gerando arquivos em /var/gits/_dg/preservCutGeo-BR2021/data/_pk0004.03/via ---
psql postgres://postgres@localhost/ingest88 -c "SELECT ingest.publicating_geojsons('via','BR','/var/gits/_dg/preservCutGeo-BR2021/data/_pk0004.03/via','1',9,'3',true);"
SSL SYSCALL error: EOF detected
connection to server was lost
make: *** [makefile:186: publicating_geojsons_via] Error 2
Executando manualmente o processo de publicação:
ingest.publicating_geojsons_p1 executa sem problemas:
ingest88=# SELECT ingest.publicating_geojsons_p1(5::bigint,'BR');
publicating_geojsons_p1
-------------------------
p1
(1 row)
ingest.publicating_geojsons_p3 Lembrar que ingest.publicating_geojsons_p2 não é mais executada! Ao executar manualmente o início de ingest.publicating_geojsons_p3:
ingest88=# INSERT INTO ingest.publicating_geojsons_p2distrib
SELECT t.hcode, t.n_items, t.n_keys, t.jj, -- length(t.hcode) AS len,
ST_Intersection(
ST_SetSRID( ST_geomFromGeohash(replace(t.hcode, '*', '')) , 4326),
(SELECT geom FROM ingest.vw01full_jurisdiction_geom WHERE isolabel_ext='BR')
) AS geom
FROM hcode_distribution_reduce_recursive_raw_alt(
((SELECT jsonb_object_agg(kx_ghs9,(CASE (SELECT geomtype FROM ingest.vw03full_layer_file WHERE id=5) WHEN 'point' THEN 1::bigint ELSE ((info->'bytes')::bigint) END) ) FROM ingest.publicating_geojsons_p3exprefix)),
1,
(SELECT length((geohash_cover_list(geom))[1]) FROM ingest.vw01full_jurisdiction_geom WHERE isolabel_ext='BR'),
9,
(SELECT (lineage->'hcode_distribution_parameters'->'p_threshold_sum')::int FROM ingest.donated_PackComponent WHERE id= 5),
(CASE (SELECT geomtype FROM ingest.vw03full_layer_file WHERE id=5) WHEN 'point' THEN 1000::int ELSE 102400::int END)
) t
;
SSL SYSCALL error: EOF detected
The connection to the server was lost. Attempting reset: Failed.
Parece que é com a execução de _hcode_distribution_reduce_recursive_rawalt.
O problema estava relacionado a quantidade de memória sendo usada pelo processo de geração da distribuição. Os ajustes feitos em https://github.com/AddressForAll/pg_pubLib-v1/commit/babc2b339d5411828733fee8ab92e7014021a122 adequaram o uso da memória à realidade do servidor.
O processo de ingestão do layer via (e provavelmente do layer geoaddress quando se fizer union com centróides dos building do OpenStreetMap) de _pk0004.03 retornou o erro
ERROR: array size exceeds the maximum allowed (1073741823)
no contexto
provavelmente quando se aplica _stcollect na função feature_asis_assign_volume: https://github.com/digital-guard/preserv/blob/main/src/ingest-step1-ini.sql#L533
São quase 5 milhões de vias:
Os limites do banco de dados podem serem vistos em: https://www.postgresql.org/docs/current/limits.html
Avaliar como contornar.
Algumas opções:
Como não consideramos as 5 regiões brasileiras como um nível jurisdicional e o problema poderia persistir (especialmente na região sudeste) acho que a opção 2 mais razoável.