Mainly this is the long overdue release of version v0.4.0. It is the first release without the inbuilt dataframe implementation, but instead depends on Datamancer.
Aside from that we have some big improvements to (some of the most annoying) errors previously encountered, namely those related to determining column types and discreteness of columns.
The defaults for discreteness have changed. By default the following column kinds are now considered discrete:
string
constant
bool
whereas float is now considered continuous by default (closing issue #91).
For integer based columns we look at a subset of 100 elements and continue to determine the discreteness based on a uniqueness > 12.5% in that sample.
For object columns we use the same rules, except we first check the kinds encountered in a subset (if the pure discrete ones exist, the column is discrete). This is particularly useful for object columns that have missing values, as these should be handled correctly now (no more crashing with cryptic "guessType" messages).
Overrides of in the form of the scale_x/..._continuous/discrete and aes = factor("foo") still exist of course.
In addition for the case of determination via uniqueness, we now output an info message so the user is aware that this is happening.
Mainly this is the long overdue release of version
v0.4.0
. It is the first release without the inbuilt dataframe implementation, but instead depends on Datamancer.Aside from that we have some big improvements to (some of the most annoying) errors previously encountered, namely those related to determining column types and discreteness of columns.
The defaults for discreteness have changed. By default the following column kinds are now considered discrete:
string
constant
bool
whereas
float
is now considered continuous by default (closing issue #91). For integer based columns we look at a subset of 100 elements and continue to determine the discreteness based on a uniqueness > 12.5% in that sample.For object columns we use the same rules, except we first check the kinds encountered in a subset (if the pure discrete ones exist, the column is discrete). This is particularly useful for object columns that have missing values, as these should be handled correctly now (no more crashing with cryptic "
guessType
" messages).Overrides of in the form of the
scale_x/..._continuous/discrete
andaes = factor("foo")
still exist of course.In addition for the case of determination via uniqueness, we now output an info message so the user is aware that this is happening.