Closed VincentVerelst closed 3 months ago
This is at least part of the reason:
netCDF assets with a time dimension could be problematic.
In the batch job from the example above the netCDF's don't have a time dimension FYI.
The problem I see is that extraction jobs generate many netcdf's in one job, while this method: https://github.com/Open-EO/openeo-python-driver/blob/1d86962102e686de71202a90838c659a53d33170/openeo_driver/views.py#L1267C9-L1267C29
will assume that job bbox is also item bbox. I think we need an approach that generates item json as part of the batch job?
using datacube extension in items also seems relevant in the case of netcdf: https://github.com/stac-extensions/datacube/blob/main/examples/item.json
For reference, GeoTIFF equivalent seems to have been implemented in EP-4118.
Related (the bands part): https://github.com/Open-EO/openeo-geotrellis-extensions/issues/259
Not yet available on openeo-dev because the integration tests fail (for unrelated reasons). Fails on CDSE dev/staging because of https://github.com/eu-cdse/openeo-cdse-infra/issues/55.
load_stac
of those results on openeo-dev results in a GDAL error:
Traceback (most recent call last):
File "batch_job.py", line 1278, in <module>
main(sys.argv)
File "batch_job.py", line 1013, in main
run_driver()
File "batch_job.py", line 984, in run_driver
run_job(
File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/utils.py", line 54, in memory_logging_wrapper
return function(*args, **kwargs)
File "batch_job.py", line 1077, in run_job
result = ProcessGraphDeserializer.evaluate(process_graph, env=env, do_dry_run=tracer)
File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 373, in evaluate
result = convert_node(result_node, env=env)
File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 398, in convert_node
process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1558, in apply_process
args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1558, in <dictcomp>
args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 412, in convert_node
return convert_node(processGraph['node'], env=env)
File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 398, in convert_node
process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1590, in apply_process
return process_function(args=ProcessArgs(args, process_id=process_id), env=env)
File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 2199, in load_stac
return env.backend_implementation.load_stac(url=url, load_params=load_params, env=env)
File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/backend.py", line 1091, in load_stac
pyramid = pyramid_factory.datacube_seq(projected_polygons, from_date.isoformat(), to_date.isoformat(),
File "/opt/spark3_4_0/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
return_value = get_return_value(
File "/opt/spark3_4_0/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o1677.datacube_seq.
: java.io.IOException: Exception while determining data type of collection https://openeo-dev.vito.be/openeo/1.1/jobs/j-2402139ee06e4f088f2cec0cc911339e/results/N2Q1MjMzODEzNzRiNjJlNmYyYWFkMWYyZjlmYjZlZGRmNjI0ZDM4MmE4ZjcxZGI2ZGNmNTc4OGUzYWFlMGFmM0BlZ2kuZXU%3D/7806fa7bc01110e93a16c7d65e599c21?expires=1708522345 and item NETCDF:/vsicurl/https://openeo-dev.vito.be/openeo/1.1/jobs/j-2402139ee06e4f088f2cec0cc911339e/results/assets/N2Q1MjMzODEzNzRiNjJlNmYyYWFkMWYyZjlmYjZlZGRmNjI0ZDM4MmE4ZjcxZGI2ZGNmNTc4OGUzYWFlMGFmM0BlZ2kuZXU%3D/821434835ac34118b66c8da71aa04003/openEO_0.nc?expires=1708522785:B04. Detailed message: Unable to determine NoData value. GDAL Exception Code: 4
at org.openeo.geotrellis.layers.FileLayerProvider.determineCelltype(FileLayerProvider.scala:728)
at org.openeo.geotrellis.layers.FileLayerProvider.readKeysToRasterSources(FileLayerProvider.scala:758)
at org.openeo.geotrellis.layers.FileLayerProvider.readMultibandTileLayer(FileLayerProvider.scala:957)
at org.openeo.geotrellis.file.PyramidFactory.datacube(PyramidFactory.scala:128)
at org.openeo.geotrellis.file.PyramidFactory.datacube_seq(PyramidFactory.scala:91)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: geotrellis.raster.gdal.MalformedDataTypeException: Unable to determine NoData value. GDAL Exception Code: 4
at geotrellis.raster.gdal.GDALDataset$.$anonfun$noDataValue$1(GDALDataset.scala:313)
at geotrellis.raster.gdal.GDALDataset$.$anonfun$noDataValue$1$adapted(GDALDataset.scala:310)
at geotrellis.raster.gdal.GDALDataset$.errorHandler$extension(GDALDataset.scala:422)
at geotrellis.raster.gdal.GDALDataset$.noDataValue$extension1(GDALDataset.scala:310)
at geotrellis.raster.gdal.GDALDataset$.cellType$extension1(GDALDataset.scala:366)
at geotrellis.raster.gdal.GDALDataset$.cellType$extension0(GDALDataset.scala:361)
at geotrellis.raster.gdal.GDALRasterSource.$anonfun$cellType$1(GDALRasterSource.scala:91)
at scala.Option.getOrElse(Option.scala:189)
at geotrellis.raster.gdal.GDALRasterSource.cellType$lzycompute(GDALRasterSource.scala:91)
at geotrellis.raster.gdal.GDALRasterSource.cellType(GDALRasterSource.scala:91)
at org.openeo.geotrellis.layers.BandCompositeRasterSource.$anonfun$cellType$1(FileLayerProvider.scala:92)
at cats.data.NonEmptyList.map(NonEmptyList.scala:87)
at org.openeo.geotrellis.layers.BandCompositeRasterSource.cellType(FileLayerProvider.scala:92)
at org.openeo.geotrellis.layers.FileLayerProvider.determineCelltype(FileLayerProvider.scala:722)
... 16 more
A gdalinfo
as well as a GDALRasterSource
of that asset URL work on my machine but not from the web app driver on openeo-dev. To investigate.
gdalinfo with debug output in driver container:
bash-4.4$ CPL_DEBUG=ON gdalinfo NETCDF:/vsicurl/https://openeo-dev.vito.be/openeo/1.1/jobs/j-2402139ee06e4f088f2cec0cc911339e/results/assets/N2Q1MjMzODEzNzRiNjJlNmYyYWFkMWYyZjlmYjZlZGRmNjI0ZDM4MmE4ZjcxZGI2ZGNmNTc4OGUzYWFlMGFmM0BlZ2kuZXU%3D/821434835ac34118b66c8da71aa04003/openEO_0.nc?expires=1708522785:B04
GDAL: CPLIsUserFaultMappingSupported(): syscall(__NR_userfaultfd) failed: insufficient permission. add CAP_SYS_PTRACE capability, or set /proc/sys/vm/unprivileged_userfaultfd to 1
HTTP: libcurl/7.61.1 OpenSSL/1.1.1k zlib/1.2.11 nghttp2/1.33.0
VSICURL: GetFileSize(https://openeo-dev.vito.be/openeo/1.1/jobs/j-2402139ee06e4f088f2cec0cc911339e/results/assets/N2Q1MjMzODEzNzRiNjJlNmYyYWFkMWYyZjlmYjZlZGRmNjI0ZDM4MmE4ZjcxZGI2ZGNmNTc4OGUzYWFlMGFmM0BlZ2kuZXU%3D/821434835ac34118b66c8da71aa04003/openEO_0.nc?expires=1708522785)=45482 response_code=200
VSICURL: Downloading 0-16383 (https://openeo-dev.vito.be/openeo/1.1/jobs/j-2402139ee06e4f088f2cec0cc911339e/results/assets/N2Q1MjMzODEzNzRiNjJlNmYyYWFkMWYyZjlmYjZlZGRmNjI0ZDM4MmE4ZjcxZGI2ZGNmNTc4OGUzYWFlMGFmM0BlZ2kuZXU%3D/821434835ac34118b66c8da71aa04003/openEO_0.nc?expires=1708522785)...
VSICURL: Got response_code=206
ERROR 4: NETCDF:/vsicurl/https://openeo-dev.vito.be/openeo/1.1/jobs/j-2402139ee06e4f088f2cec0cc911339e/results/assets/N2Q1MjMzODEzNzRiNjJlNmYyYWFkMWYyZjlmYjZlZGRmNjI0ZDM4MmE4ZjcxZGI2ZGNmNTc4OGUzYWFlMGFmM0BlZ2kuZXU%3D/821434835ac34118b66c8da71aa04003/openEO_0.nc?expires=1708522785:B04: No such file or directory
gdalinfo failed - unable to open 'NETCDF:/vsicurl/https://openeo-dev.vito.be/openeo/1.1/jobs/j-2402139ee06e4f088f2cec0cc911339e/results/assets/N2Q1MjMzODEzNzRiNjJlNmYyYWFkMWYyZjlmYjZlZGRmNjI0ZDM4MmE4ZjcxZGI2ZGNmNTc4OGUzYWFlMGFmM0BlZ2kuZXU%3D/821434835ac34118b66c8da71aa04003/openEO_0.nc?expires=1708522785:B04'.
gdalinfo with debug output on my machine:
bossie@rastapopoulos:~/opt/gdal-3.7.0/installed/bin$ CPL_DEBUG=ON ./gdalinfo NETCDF:/vsicurl/https://openeo-dev.vito.be/openeo/1.1/jobs/j-2402139ee06e4f088f2cec0cc911339e/results/assets/N2Q1MjMzODEzNzRiNjJlNmYyYWFkMWYyZjlmYjZlZGRmNjI0ZDM4MmE4ZjcxZGI2ZGNmNTc4OGUzYWFlMGFmM0BlZ2kuZXU%3D/821434835ac34118b66c8da71aa04003/openEO_0.nc?expires=1708522785:B04
./gdalinfo: error while loading shared libraries: libgdal.so.33: cannot open shared object file: No such file or directory
bossie@rastapopoulos:~/opt/gdal-3.7.0/installed/bin$ LD_LIBRARY_PATH=$(readlink -f ../lib) CPL_DEBUG=ON ./gdalinfo NETCDF:/vsicurl/https://openeo-dev.vito.be/openeo/1.1/jobs/j-2402139ee06e4f088f2cec0cc911339e/results/assets/N2Q1MjMzODEzNzRiNjJlNmYyYWFkMWYyZjlmYjZlZGRmNjI0ZDM4MmE4ZjcxZGI2ZGNmNTc4OGUzYWFlMGFmM0BlZ2kuZXU%3D/821434835ac34118b66c8da71aa04003/openEO_0.nc?expires=1708522785:B04
HTTP: libcurl/7.81.0 GnuTLS/3.7.3 zlib/1.2.11 brotli/1.0.9 zstd/1.4.8 libidn2/2.3.2 libpsl/0.21.0 (+libidn2/2.3.2) libssh/0.9.6/openssl/zlib nghttp2/1.43.0 librtmp/2.3 OpenLDAP/2.5.16
VSICURL: GetFileSize(https://openeo-dev.vito.be/openeo/1.1/jobs/j-2402139ee06e4f088f2cec0cc911339e/results/assets/N2Q1MjMzODEzNzRiNjJlNmYyYWFkMWYyZjlmYjZlZGRmNjI0ZDM4MmE4ZjcxZGI2ZGNmNTc4OGUzYWFlMGFmM0BlZ2kuZXU%3D/821434835ac34118b66c8da71aa04003/openEO_0.nc?expires=1708522785)=45482 response_code=200
VSICURL: Downloading 0-16383 (https://openeo-dev.vito.be/openeo/1.1/jobs/j-2402139ee06e4f088f2cec0cc911339e/results/assets/N2Q1MjMzODEzNzRiNjJlNmYyYWFkMWYyZjlmYjZlZGRmNjI0ZDM4MmE4ZjcxZGI2ZGNmNTc4OGUzYWFlMGFmM0BlZ2kuZXU%3D/821434835ac34118b66c8da71aa04003/openEO_0.nc?expires=1708522785)...
VSICURL: Got response_code=206
GDAL_netCDF: driver detected file type=3, libnetcdf detected type=4
GDAL_netCDF: setting file type to 4, was 3
GDAL_netCDF: var_count = 5
GDAL_netCDF:
=====
SetProjectionFromVar( 65536, 3)
GDAL_netCDF: got grid_mapping crs
GDAL_netCDF: setting WKT from GDAL
GDAL_netCDF: bIsGdalFile=0 bIsGdalCfFile=0 bSwitchedXY=0 bBottomUp=1
GDAL_netCDF: xdim: 129 dfSpacingBegin: 10.000000 dfSpacingMiddle: 10.000000 dfSpacingLast: 10.000000
GDAL_netCDF: ydim: 129 dfSpacingBegin: -10.000000 dfSpacingMiddle: -10.000000 dfSpacingLast: -10.000000
GDAL_netCDF: set bBottomUp = 0 from Y axis
GDAL_netCDF: bGotGeogCS=0 bGotCfSRS=0 bGotCfGT=1 bGotCfWktSRS=0 bGotGdalSRS=1 bGotGdalGT=0
GDAL_netCDF: netcdf type=5 gdal type=6 signedByte=1
GDAL: GDALOpen(NETCDF:/vsicurl/https://openeo-dev.vito.be/openeo/1.1/jobs/j-2402139ee06e4f088f2cec0cc911339e/results/assets/N2Q1MjMzODEzNzRiNjJlNmYyYWFkMWYyZjlmYjZlZGRmNjI0ZDM4MmE4ZjcxZGI2ZGNmNTc4OGUzYWFlMGFmM0BlZ2kuZXU%3D/821434835ac34118b66c8da71aa04003/openEO_0.nc?expires=1708522785:B04, this=0x55ae81ec3170) succeeds as netCDF.
Driver: netCDF/Network Common Data Format
GDAL: GDALDefaultOverviews::OverviewScan()
Files: /vsicurl/https://openeo-dev.vito.be/openeo/1.1/jobs/j-2402139ee06e4f088f2cec0cc911339e/results/assets/N2Q1MjMzODEzNzRiNjJlNmYyYWFkMWYyZjlmYjZlZGRmNjI0ZDM4MmE4ZjcxZGI2ZGNmNTc4OGUzYWFlMGFmM0BlZ2kuZXU%3D/821434835ac34118b66c8da71aa04003/openEO_0.nc?expires=1708522785
Size is 129, 129
Coordinate System is:
PROJCRS["WGS 84 / UTM zone 31N",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]]],
CONVERSION["UTM zone 31N",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Longitude of natural origin",3,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["m",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["m",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["easting",east,
ORDER[1],
LENGTHUNIT["m",1]],
AXIS["northing",north,
ORDER[2],
LENGTHUNIT["m",1]],
ID["EPSG",32631]]
Data axis to CRS axis mapping: 1,2
Origin = (701040.000000000000000,5626340.000000000000000)
Pixel Size = (10.000000000000000,-10.000000000000000)
Metadata:
B04#grid_mapping=crs
B04#long_name=B04
B04#units=
B04#_FillValue=nan
crs#crs_wkt=PROJCS["WGS 84 / UTM zone 31N", GEOGCS["WGS 84", DATUM["World Geodetic System 1984", SPHEROID["WGS 84", 6378137.0, 298.257223563, AUTHORITY["EPSG","7030"]], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]], UNIT["degree", 0.017453292519943295], AXIS["Geodetic longitude", EAST], AXIS["Geodetic latitude", NORTH], AUTHORITY["EPSG","4326"]], PROJECTION["Transverse_Mercator", AUTHORITY["EPSG","9807"]], PARAMETER["central_meridian", 3.0], PARAMETER["latitude_of_origin", 0.0], PARAMETER["scale_factor", 0.9996], PARAMETER["false_easting", 500000.0], PARAMETER["false_northing", 0.0], UNIT["m", 1.0], AXIS["Easting", EAST], AXIS["Northing", NORTH], AUTHORITY["EPSG","32631"]]
crs#spatial_ref=PROJCS["WGS 84 / UTM zone 31N", GEOGCS["WGS 84", DATUM["World Geodetic System 1984", SPHEROID["WGS 84", 6378137.0, 298.257223563, AUTHORITY["EPSG","7030"]], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]], UNIT["degree", 0.017453292519943295], AXIS["Geodetic longitude", EAST], AXIS["Geodetic latitude", NORTH], AUTHORITY["EPSG","4326"]], PROJECTION["Transverse_Mercator", AUTHORITY["EPSG","9807"]], PARAMETER["central_meridian", 3.0], PARAMETER["latitude_of_origin", 0.0], PARAMETER["scale_factor", 0.9996], PARAMETER["false_easting", 500000.0], PARAMETER["false_northing", 0.0], UNIT["m", 1.0], AXIS["Easting", EAST], AXIS["Northing", NORTH], AUTHORITY["EPSG","32631"]]
NC_GLOBAL#Conventions=CF-1.9
NC_GLOBAL#description=
NC_GLOBAL#institution=openEO platform - Geotrellis backend: 0.27.0a1
NC_GLOBAL#title=
x#long_name=x coordinate of projection
x#standard_name=projection_x_coordinate
x#units=m
y#long_name=y coordinate of projection
y#standard_name=projection_y_coordinate
y#units=m
Corner Coordinates:
Upper Left ( 701040.000, 5626340.000) ( 5d51' 0.89"E, 50d45'14.29"N)
Lower Left ( 701040.000, 5625050.000) ( 5d50'58.35"E, 50d44'32.58"N)
Upper Right ( 702330.000, 5626340.000) ( 5d52' 6.64"E, 50d45'12.68"N)
Lower Right ( 702330.000, 5625050.000) ( 5d52' 4.09"E, 50d44'30.97"N)
Center ( 701685.000, 5625695.000) ( 5d51'32.49"E, 50d44'52.63"N)
Band 1 Block=129x129 Type=Float32, ColorInterp=Undefined
NoData Value=nan
Metadata:
grid_mapping=crs
long_name=B04
NETCDF_VARNAME=B04
units=
_FillValue=nan
GDAL: GDALClose(NETCDF:/vsicurl/https://openeo-dev.vito.be/openeo/1.1/jobs/j-2402139ee06e4f088f2cec0cc911339e/results/assets/N2Q1MjMzODEzNzRiNjJlNmYyYWFkMWYyZjlmYjZlZGRmNjI0ZDM4MmE4ZjcxZGI2ZGNmNTc4OGUzYWFlMGFmM0BlZ2kuZXU%3D/821434835ac34118b66c8da71aa04003/openEO_0.nc?expires=1708522785:B04, this=0x55ae81ec3170)
GDAL: In GDALDestroy - unloading GDAL shared library.
The netCDF driver doesn't support Virtual IO (lacks the v
flag):
bash-4.4$ gdalinfo --formats | grep -i netcdf
netCDF -raster,multidimensional raster,vector- (rw+s): Network Common Data Format
On my machine:
bossie@rastapopoulos:~/opt/gdal-3.7.0/installed/bin$ ./gdalinfo --formats | grep -i netcdf
netCDF -raster,multidimensional raster,vector- (rw+vs): Network Common Data Format
This explains why it's able to read those files from disk just fine but not with /vsicurl
.
For reference, gdalinfo --format netCDF
should also report:
Supports: Virtual IO - eg. /vsimem/
Bumping into this:
Since GDAL 2.4, and with Linux kernel >=4.3 and libnetcdf >=4.5, read operations on /vsi file systems are supported using the userfaultfd Linux system call. If running from a container, that system call may be unavailable by default. For example with Docker, --security-opt seccomp=unconfined might be needed.
Passing that flag to docker run
indeed fixes it.
A more fine grained way to enable the userfaultfd
system call is described here and seems to work: https://github.com/LLNL/umap/blob/develop/README.md#example-running-the-umap-container-with-a-seccomp-whitelist
I'm not sure what the consequences are; is this an option @jdries ?
I learned that k8s does allow these system calls by default and indeed, load_stac
is able to read netCDF assets and the result can e.g. be saved as a GeoTIFF. Unfortunately, the load_stac
-batch job crashes upon completion and is marked as error:
{"message": "Writing results to object storage", "levelname": "INFO", "name": "openeogeotrellis.deploy.batch_job", "created": 1708000014.1691525, "filename": "batch_job.py", "lineno": 1215, "process": 70, "job_id": "j-24021573df2347b4a1c71931f507ecd1", "user_id": "df7ea45d-ecc4-453f-8af9-de8cfb1058b1"}
{"message": "batch_job.py main os.getpid()=70: end 2024-02-15 12:26:56.081776, elapsed 0:00:46.435952", "levelname": "INFO", "name": "openeogeotrellis.deploy.batch_job", "created": 1708000016.0818608, "filename": "util.py", "lineno": 347, "process": 70, "job_id": "j-24021573df2347b4a1c71931f507ecd1", "user_id": "df7ea45d-ecc4-453f-8af9-de8cfb1058b1"}
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 140338640241344:
#000: ../../src/H5T.c line 1754 in H5Tclose(): not a datatype
major: Invalid arguments to routine
minor: Inappropriate type
[1 of 1000] FAILURE(3) CPLE_AppDefined(1) "Application defined error." netcdf error #-101 : NetCDF: HDF error .
at (/home/jenkins/rpmbuild/BUILD/gdal-3.7.0-fedora/frmts/netcdf/netcdfdataset.cpp,Close,2964)
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fa273752844, pid=15, tid=15
#
# JRE version: OpenJDK Runtime Environment 18.9 (11.0.14+9) (build 11.0.14+9-LTS)
# Java VM: OpenJDK 64-Bit Server VM 18.9 (11.0.14+9-LTS, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C [libgdalwarp_bindings.so+0x4a844] std::_Rb_tree<int, std::pair<int const, std::tuple<int, std::chrono::duration<long, std::ratio<1l, 1000l> > > >, std::_Select1st<std::pair<int const, std::tuple<int, std::chrono::duration<long, std::ratio<1l, 1000l> > > > >, std::less<int>, std::allocator<std::pair<int const, std::tuple<int, std::chrono::duration<long, std::ratio<1l, 1000l> > > > > >::_M_begin()+0xc
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" (or dumping to /opt/spark/work-dir/core.15)
#
# An error report file with more information is saved as:
# /opt/spark/work-dir/hs_err_pid15.log
#
# If you would like to submit a bug report, please visit:
# https://bugzilla.redhat.com/enter_bug.cgi?product=Red%20Hat%20Enterprise%20Linux%208&component=java-11-openjdk
#
It's not possible to download the result assets in the normal way (not even with ?partial=true
) but they do end up in S3 and look as expected.
Sync requests work fine so there's that.
A more fine grained way to enable the
userfaultfd
system call is described here and seems to work: https://github.com/LLNL/umap/blob/develop/README.md#example-running-the-umap-container-with-a-seccomp-whitelist
I considered enabling this just for batch jobs but I can't find a way to pass this --security-opt
to spark-submit
either so yeah.
To summarize, at this point:
Possible solutions:
Internal ref: GDD-3173
- On K8S: call GDALWarp#deinit before batch job end to clean up
Confirmed: works!
Resolution of GDD-3173:
I won't be able to get the seccomp profile working on our current Hadoop cluster due to the outdated kernel on Centos7. I've already implemented the change in the new cluster, but that one is still under development.
When saving results of a batch job as netCDF, the resulting STAC collection doesn't contain any items and therefore cannot be loaded using
load_stac
.For example the following batch job: j-2401190a5a2144868480ccba676ee9db.json
will result in the following STAC metadata: job-results.json
Using
load_stac
on these results will result in a NoDataAvailable Exception.