Correlation analysis is limited to 31 (useful) columns - useful in this context meaning the column is not invariant. For example a column will all null values would not be counted here.
This has the downside that if there are more than 31 useful columns in a table, they will not be considered for correlation analysis. Hopefully this is rare enough as not to be a major issue, however in the long run this will need to be fixed.
An exhaustive fix for this issue involves breaking the correlation analysis queries up into sub-queries so that each sub-query has no more than 31 columns, and then recombining the results for the correlation analysis (probability matrices). The trickiest part of implementing this will involve managing the grouping_id and associated column groupings for each partial query. The current implementation assumes a single grouping_id that can be used as a proxy for each grouping of columns.
(See #344)
Correlation analysis is limited to 31 (useful) columns - useful in this context meaning the column is not invariant. For example a column will all
null
values would not be counted here.This has the downside that if there are more than 31 useful columns in a table, they will not be considered for correlation analysis. Hopefully this is rare enough as not to be a major issue, however in the long run this will need to be fixed.
An exhaustive fix for this issue involves breaking the correlation analysis queries up into sub-queries so that each sub-query has no more than 31 columns, and then recombining the results for the correlation analysis (probability matrices). The trickiest part of implementing this will involve managing the
grouping_id
and associated column groupings for each partial query. The current implementation assumes a singlegrouping_id
that can be used as a proxy for each grouping of columns.