TheDataStation / ver

Data Discovery Tools and Systems
MIT License
6 stars 10 forks source link

Column Semantic Type Set to Null if First Chunk of Data is All Null #73

Open ogiorgil opened 7 months ago

ogiorgil commented 7 months ago

In the implementation of checkSpatialTemporal (once #71 is merged), we determine a column's spatial/temporal-ness (semantic type) based on the first ProfilerConfig.NUM_RECORD_READ data. If all these data are null, the column's semantic type will be considered NONE, even though there may exist non-null values later on in the table.

We could either drop all non-null values before passing data into the PreAnalyzer or modify the estimateSemanticType function to retry the determination of a column's semantic type if all read values were null.

luthfibalaka commented 6 months ago

Is this issue solved already? I tried running it on a csv file by changing NUM_RECORD_READ to 1 (all values in the first row of the csv file are null), but there is no issue for labeling the column. Perhaps you have a way to reproduce the issue?