Receiver data types - when to allow non standard column types

@africanmathsinitiative/developers We have discussed this issue before, which is when should receivers allow columns like logical, Date and other non standard data types to be entered.

For example, some of the frequency table dialogs have factor receivers. But a logical column is like a factor with two levels: TRUE and FALSE. And it's sensible to sometimes have a logical column as one of your table's factors.

We also use Date columns. It's not numeric, but I can add and subtract a number from them, take the min, max, mean of it etc.. Similarly logical columns are internally stored in R as 0 and 1 so you can do almost all same operations on them as you do on numeric columns.

Therefore, it can be very frustrating for users who want to use these different types of columns (which we produce) but they are often excluded from receivers because when we set the type as numeric or factor, logical and Date (and everything else) are always excluded even when they might not need to be.

This has become more urgent with the procurement data work because these data sets have logical and Date columns and we need to be able to analyse them sensibly.

So what I have just done (https://github.com/africanmathsinitiative/R-Instat/pull/4003) is changed how we set data types for a receiver. There is now an optional parameter to SetDataType and SetIncludedDataTypes which is bOnlyExcludeOppositeType. The default is True and when True, if for example, the data type is set to numeric, instead of only including numeric columns, it will instead exclude character and factor columns. And the reverse for character and factor (exclude numeric). This means now by default setting the type to numeric, or factor, will allow logical and Date (and other) column types.

This change doesn't affect setting to other types, so setting the type to Date still just includes Dates because I think that is still what we want. Similarly, if you set multiple included types, like factor and Date, it will just include those, because its not clear what to exclude.

When bOnlyExcludeOppositeType = False then it does what used to be done, and only include that type. We still need this option, for example the Levels/Labels dialog shouldn't include logical columns because this only works on factor columns. I have already corrected this for the Factor menu dialogs but there may be others.

And so this has introduced some instability because there are likely other dialogs which will now give errors when using these other types.

I would really like everyone to test this out on all our dialogs which set specific data types so it quickly becomes stable again. If a command only works with a specific data type then we should change it back to only allowing this.

And some might be less obvious and need discussion. For example, the Canonical Correlations command (cancor) works with logical columns, but not Date columns. So should this be set back to only allowing strictly numeric columns? Or keep the new setting so that logical is allowed, but so are others which give an error?

It would be good for us to decide what our rules are in these cases, whether we go for more cautiousness to prevent errors, or more flexibility, to allow more columns to be used. I think we sort of wanted to go for a bit more flexibility. And we do want users to be aware of the column types they have and what is and isn't sensible, especially when you have unusual types. If there are sensible errors when you use an unusual type, are we happy with allowing that?

IDEMSInternational / R-Instat

Receiver data types - when to allow non standard column types #4004