Closed yoid2000 closed 5 years ago
As X=max(col)-min(col)/10 -> 4112.8. Select query based on this x value give buckets of six and average amounts to 1311 and the min is 753, which is greater than half of the average. So the column is marked as enumerative. Can you please confirm?.
Ok, you are right...I used the wrong numbers.
But wow, that function is insanely sensitive to bucket size. If I change the bucket size to 4113, then I get seven buckets, and the minimum bucket has count of 7! I guess what is happening is that there are gaps in the number space, and the output varies a lot depending on whether a bucket overlaps into a gap or not.
Ok, let's leave it as it is (we could probably play with this more, since acct_date
is in fact continuous and we are mis-labeling it, but I already said that we'll sometimes get this wrong so that's fine.
The column
amount
on tableorders
in databaseraw_banking
has column_labelenumerative
. It should becontinuous
. This may be related to #26 . (Since this column is areal
, it should no matter what be continuous.)In addition, the column
acct_date
also should becontinuous
. This one would not be related to #26.Ok, I see that the original issue has this statement:
Regarding acct_date, if I make this query:
Then I get 5 buckets back. (I guess there are gaps in the number space which prevent 10 buckets?). The average count of these 5 are 1573, and the min is 753, which is less than half of the average. So I think this should have been labeled as continuous.
This is based on running the script in https://gist.github.com/srnb/f7679c432a87af88ed957318fe8815bb