NRCan / geo-deep-learning

Deep learning applied to georeferenced datasets
https://geo-deep-learning.readthedocs.io/en/latest/
MIT License
149 stars 49 forks source link

Postprocessing: some extractions fail at polygonizaiton with error "database or disk is full" #454

Closed remtav closed 1 year ago

remtav commented 1 year ago

An image has caused our polygonization step with a qgis container to fail.

Sbatch file to reproduce in HPC:

#! /bin/bash
#SBATCH --partition=gpu_[...]
#SBATCH --account=[...]
#SBATCH --qos=low
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --gpus=0
#SBATCH --time=06:00:00
#SBATCH --mem=256G
#SBATCH --job-name=postprocessing-OttawaP12022-017145312010_01_P001-WV03_1
#SBATCH --output=[...]/operationalization/geosys-jobs/logs/postprocessing-OttawaP12022-017145312010_01_P001-WV03_1_ISSUE.out
#SBATCH --comment="image=nrcan/nrcan_all_default_ubuntu-20.04-amd64_latest"

export http_proxy=http://webproxy.science.gc.ca:8888/
export https_proxy=http://webproxy.science.gc.ca:8888/

cd operationalization/geo-deep-learning/

source [...]/bin/activate
conda activate geo_deep_env

python GDL.py --config-name=ccmeo_production.yaml mode=postprocess \
inference.input_stac_item=https://datacube[...]/worldview-3-ortho-pansharp/items/OttawaP12022-017145312010_01_P001-WV03 \
inference.state_dict_path=https://datacube-[...]/models/pl_smp_unet_NRG_FORE_20220520.pth.tar \
postprocess.output_name=OttawaP12022-017145312010_01_P001-WV03 inference.output_name=OttawaP12022-017145312010_01_P001-WV03_FORE \
inference.checkpoint_dir=[...]/operationalization/checkpoints/ \
postprocess.root_dir=[...]/operationalization/inferences/ \
postprocess.reg_cont.cont_image=[...]/singularity_images/gdl-cuda11_v2.3.3-prod.sif

Error:

[...]/operationalization/geo-deep-learning/GDL.py:15: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path=config_path, config_name=config_name)
[...]/envs/geo_deep_env/lib/python3.10/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[2023-01-19 15:45:06,446][root][INFO] -
Overwritten parameters in the config:
inference.checkpoint_dir=[...]/operationalization/checkpoints/,inference.input_stac_item=https://datacube-stage[...]/worldview-3-ortho-pansharp/items/OttawaP12022-017145312010_01_P001-WV03,inference.output_name=OttawaP12022-017145312010_01_P001-WV03_FORE,inference.state_dict_path=https://datacube[...]/models/pl_smp_unet_NRG_FORE_20220520.pth.tar,mode=postprocess,postprocess.output_name=OttawaP12022-017145312010_01_P001-WV03,postprocess.reg_cont.cont_image=[...]/singularity_images/gdl-cuda11_v2.3.3-prod.sif,postprocess.root_dir=[...]/operationalization/inferences/
[2023-01-19 15:45:06,447][root][INFO] -
--------------------------------------------
Let's start postprocess for segmentation !!!
--------------------------------------------
[2023-01-19 15:45:06,688][root][INFO] -
Provided path is url. Cannot validate it's existence nor convert to Path object. Got:
https://datacub[...]/models/pl_smp_unet_NRG_FORE_20220520.pth.tar
[2023-01-19 15:45:06,713][postprocess_segmentation][INFO] -
Converting geo-deep-learning checkpoint to pytorch lightning...
[2023-01-19 15:45:07,317][root][INFO] -
=> checking model compatibility...
[2023-01-19 15:45:07,539][root][INFO] -
=> loading model '[...]/operationalization/checkpoints/pl_smp_unet_NRG_FORE_20220520.pth.tar'
[2023-01-19 15:45:07,851][root][INFO] -
Parameters from checkpoint will override inputted parameters.
                         Inputted | Overriden
Model:           {'_target_': 'segmentation_models_pytorch.Unet', 'encoder_name': 'resnext50_32x4d', 'encoder_depth': 4, 'encoder_weights': 'imagenet', 'decoder_channels': [256, 128, 64, 32]} | {'_target_': 'segmentation_models_pytorch.Unet', 'encoder_name': 'resnext50_32x4d', 'encoder_depth': 4, 'encoder_weights': 'imagenet', 'decoder_channels': [256, 128, 64, 32], 'in_channels': 3, 'classes': 1}
Input bands:            ['red', 'green', 'blue', 'nir'] | ['nir', 'red', 'green']
Output classes:         {'FORE': 1, 'WAER': 2, 'ROAI': 3, 'BUIL': 4} | {'FORE': 1}
Normalization means and stds:           {'mean': None, 'std': None} | {'mean': [0.43911179, 0.41455254, 0.41057263], 'std': [0.21790755, 0.20137545, 0.18109507]}
Scale data range:               [0, 1] | [0, 1]
Raster enhance clip limit:              0 | 0
Single class mode:              None | None
[2023-01-19 15:45:07,854][postprocess_segmentation][INFO] - Polygonizing prediction to [...]/operationalization/inferences/OttawaP12022-017145312010_01_P001-WV03_FORE_raw.gpkg...
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-ret000'
Enabling plugin: "grassprovider"
Plugin is already enabled!
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-ret000'
/usr/lib/python3/dist-packages/qgis/utils.py:888: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  mod = _builtin_import(name, globals, locals, fromlist, level)

----------------
Inputs
----------------

input:  /home/OttawaP12022-017145312010_01_P001-WV03_FORE.tif
output: /tmp/rtovect.gpkg
type:   2

g.proj -c wkt="/tmp/processing_zPHxdm/242f62611286481898118bec49f4bb1b/crs.prj"
r.in.gdal input="/home/OttawaP12022-017145312010_01_P001-WV03_FORE.tif" band=1 output="rast_63c9658e00ef32" --overwrite -o
g.region n=5037201.639970779 s=5014223.32221508 e=447776.4541926384 w=433496.6259021759 res=0.33867347240448
r.to.vect  input=rast_63c9658e00ef32 type="area" column="value" output=outputd61292f8906b4835b4f21ef726c2f0ea --overwrite
v.out.ogr type="auto" input="outputd61292f8906b4835b4f21ef726c2f0ea" output="/tmp/rtovect.gpkg" format="GPKG"  --overwrite
Starting GRASS GIS...
Cleaning up temporary files...
Executing </tmp/processing_zPHxdm/grassdata/grass_batch_job.sh> ...
Default region was updated to the new projection, but if you have multiple mapsets `g.region -d` should be run in each to update the region from the default
Projection information updated
Over-riding projection check
Importing raster map <rast_63c9658e00ef32>...
0..3..6..9..12..15..18..21..24..27..30..33..36..39..42..45..48..51..54..57..60..63..66..69..72..75..78..81..84..87..90..93..96..99..100
Extracting areas...
0..3..6..9..12..15..18..21..24..27..30..33..36..39..42..45..48..51..54..57..60..63..66..69..72..75..78..81..84..87..90..93..96..99..100
Writing areas...
0..4..8..12..16..20..24..28..32..36..40..44..48..52..56..60..64..68..72..76..80..84..88..92..96..100
Building topology for vector map <outputd61292f8906b4835b4f21ef726c2f0ea@PERMANENT>...
Registering primitives...
10000..20000..30000..40000..50000..60000..70000..
Building areas...
0..2..4..6..8..10..12..14..16..18..20..22..24..26..28..30..32..34..36..38..40..42..44..46..48..50..52..54..56..58..60..62..64..66..68..70..72..74..76..78..80..82..84..86..88..90..92..94..96..98..100
Attaching islands...
0..2..4..6..8..10..12..14..16..18..20..22..24..26..28..30..32..34..36..38..40..42..44..46..48..50..52..54..56..58..60..62..64..66..68..70..72..74..76..78..80..82..84..86..88..90..92..94..96..98..100
Attaching centroids...
0..2..4..6..8..10..12..14..16..18..20..22..24..26..28..30..32..34..36..38..40..42..44..46..48..50..52..54..56..58..60..62..64..66..68..70..72..74..76..78..80..82..84..86..88..90..92..94..96..98..100
r.to.vect complete.
Exporting 37777 areas (may take some time)...
5..11..17..23..29..35..41..47..53..59..65..71..77..83..89..95..100
ERROR:  ERROR 1: failed to execute insert : database or disk is full
ERROR 1: failed to execute insert : database or disk is full
ERROR:  ERROR: Failed to create OGR feature
ERROR: Failed to create OGR feature
Execution of </tmp/processing_zPHxdm/grassdata/grass_batch_job.sh> finished.
Cleaning up default sqlite database ...
ERROR:  ERROR: Error while executing: 'VACUUM'
ERROR: Error while executing: 'VACUUM'
Cleaning up temporary files...
Starting GRASS GIS...
Cleaning up temporary files...
Executing </tmp/processing_zPHxdm/grassdata/grass_batch_job.sh> ...
ERROR:  ERROR: RTreeWriteNode(): Unable to write (No space left on device)
ERROR: RTreeWriteNode(): Unable to write (No space left on device)
Execution of </tmp/processing_zPHxdm/grassdata/grass_batch_job.sh> finished.
Cleaning up default sqlite database ...
ERROR:  ERROR: Error while executing: 'VACUUM'
ERROR: Error while executing: 'VACUUM'
Cleaning up temporary files...

----------------
Results
----------------

output: /tmp/rtovect.gpkg
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-ret000'
/usr/lib/python3/dist-packages/qgis/utils.py:888: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  mod = _builtin_import(name, globals, locals, fromlist, level)

----------------
Inputs
----------------

FIELD:  value
INPUT:  /tmp/rtovect.gpkg
OPERATOR:       2
OUTPUT: /home/OttawaP12022-017145312010_01_P001-WV03_FORE_raw.gpkg
VALUE:  0

ERROR:  An error was encountered while checking parameter values
        Could not load source layer for INPUT: /tmp/rtovect.gpkg not found
remtav commented 1 year ago

Also see: postprocessing-08MG0X2P6WV32015V2-017139723010_01_P001-WV03_1.out postprocessing-08MG0X2P7WV3V2-017139722010_01_P001-WV03_1.out

CharlesAuthier commented 1 year ago

Will be fix soon, if you have this error, contact me.

remtav commented 1 year ago

Cause was found: temporary directory where the qgis container was writing was limited in space. Issue not related to GDL.

gevro commented 1 year ago

How did you fix this issue?