IsEmail could return true when there were no non-null values in a column.
null values were implicitly included in the counts for distinct values, but not in the list of distinct values returned in the column metrics.
Errors on grouping sub-queries didn't always cause the exploration to fail, this has been changed. Better to throw and make the failure explicit (and report to api consumer) than to fail silently and return computations based on incomplete data.
And one not-so-minor issue:
If more than 31 columns were requested can't group over the whole table (#342).
Solution:
Remove any invariant columns to reduce the total column count.
Only consider the first 31 columns for correlation analysis- any further columns will be sampled as independent variables.
This has the downside that if there are more than 31 useful columns in a table, they will not be considered for correlation analysis. Hopefully this is rare enough as not to be a major issue. An exhaustive fix for this issue involves breaking the correlation analysis queries up into sub-queries so that each sub-query has no more than 31 columns, and then recombining the results for the correlation analysis (probability matrices). I'll create an issue for this for future reference.
Fixes a few minor issues.
IsEmail
could returntrue
when there were no non-null values in a column.null
values were implicitly included in the counts for distinct values, but not in the list of distinct values returned in the column metrics.And one not-so-minor issue:
Solution:
This has the downside that if there are more than 31 useful columns in a table, they will not be considered for correlation analysis. Hopefully this is rare enough as not to be a major issue. An exhaustive fix for this issue involves breaking the correlation analysis queries up into sub-queries so that each sub-query has no more than 31 columns, and then recombining the results for the correlation analysis (probability matrices). I'll create an issue for this for future reference.
Fixes #342 Fixes #346