OSGeo / gdal

GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.
https://gdal.org
Other
4.78k stars 2.5k forks source link

OGR PDS Layer is always casting to 8Bytes #570

Closed kidpixo closed 6 years ago

kidpixo commented 6 years ago

Intro

I am working with point spectrometer files, in particular from MESSENGER/MASCS PDS v3 (see example below).

The PDS data have a detached header file (.LBL) describing the data that internally points to a "STRUCTURE" file (FMT), that contains the commons data structure for all the dataset.

See relevant official NASA documentation for PDS3 :

read_mascs_org_dgal.ipynb

Expected behavior and actual behavior.

This particular dataset uses 8 Bytes IEEE_REAL types for different data COLUMNS (in PDS parlance).

I expected OGR/PDS driver to read the data as 8 Bytes, but it just cast them to 4 Bytes ( see code for ogrpdslayer.cpp ) effectively trowing away the last 4 Bytes, without any Warning or Error to the user.

Link to Data

For ogr tools to work, one need to copy the FMT file in the same folder as the data+header.

This is not the case for a standard NASA PDS archive, where the FMT files are all stored in ROOT/label/.

All those data are attached here github_issue.zip

Steps to reproduce the problem.

Data extracted from gdal/ogr cli tool:

$ ogrinfo -ro -al -q -fid 403 virsvd_orb_11187_050618.lbl | grep 'ALONG_TRACK_FOOTPRINT_SIZE\|ACROSS_TRACK_FOOTPRINT_SIZE\|INCIDENCE_ANGLE\|EMISSION_ANGLE\|PHASE_ANGLE\|SOLAR_DISTANCE' | sort
 ACROSS_TRACK_FOOTPRINT_SIZE (Real) = 3.85195541381836
 ALONG_TRACK_FOOTPRINT_SIZE (Real) = 5.43546199798584
 EMISSION_ANGLE (Real) = 2.85403800010681
 INCIDENCE_ANGLE (Real) = 3.18118929862976
 PHASE_ANGLE (Real) = 3.30409026145935
 SOLAR_DISTANCE (Real) = 17.6821403503418

"True" data extracted with the official tool delivered from the team ( MESSENGER MASCS Software) :

$ cat virsvd_orb_11187_050618.idltxt | grep 'ALONG_TRACK_FOOTPRINT_SIZE\|ACROSS_TRACK_FOOTPRINT_SIZE\|INCIDENCE_ANGLE\|EMISSION_ANGLE\|PHASE_ANGLE\|SOLAR_DISTANCE' | sort
 'ACROSS_TRACK_FOOTPRINT_SIZE': 1149.2706403479999,
 'ALONG_TRACK_FOOTPRINT_SIZE': 17048.826443112001,
 'EMISSION_ANGLE': 81.466268350000007,
 'INCIDENCE_ANGLE': 3.5677553799999999,
 'PHASE_ANGLE': 77.913549509999996,
 'SOLAR_DISTANCE': 61770628.950300902,

Both command line tools and python binding relies on the same code ogrpdslayer.cpp thus they share the sam behaviour.

Partial Solution (Workaround)

For the python binding only, I use this workaround:

  1. Modify the header file doubling the number of element for the 8Bytes types ( N > 2N)
  2. Modify the same object as 4Bytes
  3. Read them with osgeo.ogr.Open
  4. "Collapse" the 2N/4Bytes arrays to N/8Bytes arrays via numpy.

This works reasonably well/fast, but it is hacky and only via python.

A notebook summarising my findings is here : read_mascs_org_dgal.ipynb

Real Solution

Implement the 8Bytes Real type in ogrpdslayer.cpp .

An easy test would be to delete the Lines 248-249 that force the type to 4Bytes, but I have no idea if this will impact other parts or if 4Bytes types are assumed/hardcoded somewhere else.

I'll try to compile myself and test it.

GDAL version and provenance

I check until the tag v2.3.0beta1 and the code fragment in ogrpdslayer.cpp (Lines 246-249) are the same.

kidpixo commented 6 years ago

That was the fastest bug fix ever @rouault !!