geo-engine / geoengine

Workspace for the geo engine crates
https://www.geoengine.de/en/
Apache License 2.0
35 stars 15 forks source link

Inconsistent loading of CSV files with empty fields #487

Closed 1lutz closed 2 years ago

1lutz commented 2 years ago

Consider the following csv file: lonlat_empty_field.csv

Longitude,Latitude,Name
1.1,2.2,foo
1.1,2.2,

This file will be loaded using a standard OgrSource. The FeatureDataRef returned by the data-method identifies the string in the scond row as NULL. This can be seen in the following test:

let column: FeatureDataRef = result.data("Name").unwrap();

assert_eq!(column.nulls(), vec![false, true]);
assert_eq!(column.get_unchecked(0), FeatureDataValue::NullableText(Some("foo".to_string())));
assert_eq!(column.get_unchecked(1), FeatureDataValue::NullableText(None));

But at the same time the MultiPointCollection itself stores an empty string:

let pc = MultiPointCollection::from_data(
    MultiPoint::many(vec![vec![(1.1, 2.2)], vec![(1.1, 2.2)]]).unwrap(),
    vec![Default::default(); 2],
    {
        let mut map = HashMap::new();
        map.insert("Name".into(), FeatureData::NullableText(vec![Some("foo".to_owned()), Some("".to_owned())]));
        map
    },
)
    .unwrap();

assert_eq!(result, pc);

This discrepancy could lead to the problem that the Point in Polygon-Operator throws an error.

Possible solution

The GDAL OpenOption "EMPTY_STRING_AS_NULL=YES" can be set (which is done in #486). In that case the MultiPointCollection will store the second string cell as NULL and the Operator functions properly:

let pc = MultiPointCollection::from_data(
    MultiPoint::many(vec![vec![(1.1, 2.2)], vec![(1.1, 2.2)]]).unwrap(),
    vec![Default::default(); 2],
    {
        let mut map = HashMap::new();
        map.insert("Name".into(), FeatureData::NullableText(vec![Some("foo".to_owned()), None]));
        map
    },
)
    .unwrap();

assert_eq!(result, pc);

I do not know the exact reason why the data inside the FeatureDataRef and the internal data in MultiPointCollection happens to be different yet.

1lutz commented 2 years ago

I noticed that using the small demo dataset a workflow with Point in Polygon does not fail even if "EMPTY_STRING_AS_NULL=YES" is not set. The option seems to be only needed for the larger dataset Amphi_env.csv I got from NFDI (The text field "Method/Device" is sufficient to cause the error).

1lutz commented 2 years ago

User-Uploads können Vektordatasets erzeugen, die Actix panicken lassen

ChristianBeilschmidt commented 2 years ago

Is the problem that is stores a null bit in the bitmap or that it stores some empty string in the data buffer?

Maybe NullableText just behaves wrong. I guess we just have to check that we either store a null and return a null or store an empty string an return one (and don't claim that this is null).

ChristianBeilschmidt commented 2 years ago

cllosed by #486