OSGeo / gdal

GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.
https://gdal.org
Other
4.9k stars 2.55k forks source link

Crash in GDALCopyWords64 when switching to CPL_CPU_REQUIRES_ALIGNED_ACCESS #11253

Closed schwehr closed 1 day ago

schwehr commented 1 day ago

What is the bug?

I hit trouble going to 8f2f8efd89db360811f4aab8c7e6a67b307cc12c, so I'm backing up and setting CPL_CPU_REQUIRES_ALIGNED_ACCESS. I think the crash happens in here: https://github.com/OSGeo/gdal/blob/566204a0969479efccbe7671ba7a40e0375a5ead/gcore/rasterio.cpp#L3326-L3332

            for (decltype(nWordCount) i = 0; i < nWordCount; i++)
            {
                memcpy(static_cast<GByte *>(pDstData) + nDstPixelStride * i,
                       static_cast<const GByte *>(pSrcData) +
                           nSrcPixelStride * i,
                       nDstDataTypeSize);
            }

I get a sandbox crash in GDALCopyWords64 when I enable CPL_CPU_REQUIRES_ALIGNED_ACCESS in a test that tries to read this grib. And it works fine from gdalinfo (with both asan and msan builds).

gdalinfo gfs_193.20230720.i0000.f168_nodata.grb2 -stats -mm | grep -v GRIB
Files: gfs_193.20230720.i0000.f168_nodata.grb2
Size is 1440, 721
Coordinate System is:
    DATUM["unnamed",
        ELLIPSOID["Sphere",6371229,0,
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433,
            ID["EPSG",9122]]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]]
Data axis to CRS axis mapping: 2,1
Origin = (-180.125000000000000,90.125000000000000)
Pixel Size = (0.250000000000000,-0.250000000000000)
Corner Coordinates:
Upper Left  (-180.1250000,  90.1250000) (180d 7'30.00"W, 90d 7'30.00"N)
Lower Left  (-180.1250000, -90.1250000) (180d 7'30.00"W, 90d 7'30.00"S)
Upper Right ( 179.8750000,  90.1250000) (179d52'30.00"E, 90d 7'30.00"N)
Lower Right ( 179.8750000, -90.1250000) (179d52'30.00"E, 90d 7'30.00"S)
Center      (  -0.1250000,   0.0000000) (  0d 7'30.00"W,  0d 0' 0.01"N)
Band 1 Block=1440x1 Type=Float64, ColorInterp=Undefined
  Description = 0[-] MSL="Mean sea level"
    Computed Min/Max=94466.789,105917.992
  Minimum=94466.789, Maximum=105917.992, Mean=100837.204, StdDev=1512.677
  Metadata:
    STATISTICS_MAXIMUM=105917.9921875
    STATISTICS_MEAN=100837.20431843
    STATISTICS_MINIMUM=94466.7890625
    STATISTICS_STDDEV=1512.6766293123
    STATISTICS_VALID_PERCENT=100
Band 2 Block=1440x1 Type=Float64, ColorInterp=Undefined
  Description = 1829[m] GPML="Specific altitude above mean sea level"
    Computed Min/Max=-63.832,58.968
  Minimum=-63.832, Maximum=58.968, Mean=2.858, StdDev=9.001
  NoData Value=9.999000260554009e+20
  Metadata:
    STATISTICS_MAXIMUM=58.967861175537
    STATISTICS_MEAN=2.8578809859788
    STATISTICS_MINIMUM=-63.832141876221
    STATISTICS_STDDEV=9.0006018345128
    STATISTICS_VALID_PERCENT=91.28
Band 3 Block=1440x1 Type=Float64, ColorInterp=Undefined
  Description = 0[-] SFC="Ground or water surface"
    Computed Min/Max=2.000,6.000
  Minimum=2.000, Maximum=6.000, Mean=3.424, StdDev=1.396
  NoData Value=9999
  Metadata:
    STATISTICS_MAXIMUM=6
    STATISTICS_MEAN=3.4238083182467
    STATISTICS_MINIMUM=2
    STATISTICS_STDDEV=1.3956606436979
    STATISTICS_VALID_PERCENT=40.33

Steps to reproduce the issue

This is in a non-standard build Linux environment with a bazel based build inside sandbox2 so I'm hoping my description will trigger someone to who hit something similar.

Versions and provenance

yes, I'm back at 566204a09694 from 2023-09-15, right before CPL_CPU_REQUIRES_ALIGNED_ACCESS goes away in https://github.com/OSGeo/gdal/commit/8f2f8efd89db360811f4aab8c7e6a67b307cc12c. And this is in sandbox2 where I can't pull off debugging or logging easily.

Additional context

Chances are slim, but just maybe someone will have an idea. I'll likely disable the aligned path in GDALCopyWords64 and move on.

rouault commented 1 day ago

gfs_193.20230720.i0000.f168_nodata.grb2

@schwehr can you attach / link to that dataset?

schwehr commented 1 day ago

gfs_193.20230720.i0000.f168_nodata.grb2.zip

rouault commented 1 day ago

How exactly are you reading ? Trying gdalinfo -mm -stats, gdalinfo -checksum, gdal_translate ... out.tif, gdal_translate ... out.tif -co INTERLEAVE=BAND, or Python ReadRaster() on this dataset shows that the code path of https://github.com/OSGeo/gdal/blob/566204a0969479efccbe7671ba7a40e0375a5ead/gcore/rasterio.cpp#L3326-L3332 is not taken

schwehr commented 1 day ago

I didn't write this code and I haven't been able to follow it all the way through. The data is being fed through a VSI wrapper for the sandbox boundary. The only trace I can get so far is without line numbers:

GDALCopyWords64+0x124(0x5611918b6054)
GDALRasterBand::IRasterIO(GDALRWFlag, int, int, int, int, void*, int, int, GDALDataType, long long, long long, GDALRasterIOExtraArg*)+0x507(0x5611918b1227)
GDALRasterBand::RasterIO(GDALRWFlag, int, int, int, int, void*, int, int, GDALDataType, long long, long long, GDALRasterIOExtraArg*)+0x305(0x561191878e45)
schwehr commented 1 day ago

I think this was because the sandbox code didn't check the data type correctly and the as a result was working with a pixel size of 0 bytes as the test injects a data type of 0 - Unknown.

rouault commented 1 day ago

I think this was because the sandbox code didn't check the data type correctly and the as a result was working with a pixel size of 0 bytes as the test injects a data type of 0 - Unknown.

ah indeed, a lot / most GDAL methods aren't ready to deal with GDT_Unknown!

rouault commented 1 day ago

ah indeed, a lot / most GDAL methods aren't ready to deal with GDT_Unknown!

ticketed as https://github.com/OSGeo/gdal/issues/11257