apjanke / octave-tablicious

Table (relational, tabular data) implementation for GNU Octave
https://apjanke.github.io/octave-tablicious/
GNU General Public License v3.0
28 stars 11 forks source link

Improve main string class doco #114

Open apjanke opened 6 months ago

apjanke commented 6 months ago

The main string class doco needs some work. The main intro sentence, "A string array of Unicode strings.", is really clumsy, the whole thing focuses too much on Unicode, and it's not clear who the audience is. (Like, I'm basically talking to coders who already understand basic computer text representation and the standard Octave text types.)

Update it.

This is expanded from https://github.com/apjanke/octave-tablicious/issues/110.

pr0m1th3as commented 6 months ago

The implementation of string class you merged in main branch is missing the <missing> values as it is . As a result

>> sz = [4 3];
>> varTypes = {'double','datetime','string'};
>> T = table('Size',sz,'VariableTypes',varTypes)
T =
table: 4 rows x 3 variables
  VariableNames: Var1, Var2, Var3
>> prettyprint(T)
----------------------
| Var1 | Var2 | Var3 |
----------------------
| 0    | NaT  |      |
| 0    | NaT  |      |
| 0    | NaT  |      |
| 0    | NaT  |      |
----------------------

instead of

>> prettyprint(T)
---------------------------
| Var1 | Var2 |   Var3    |
---------------------------
| 0    | NaT  | <missing> |
| 0    | NaT  | <missing> |
| 0    | NaT  | <missing> |
| 0    | NaT  | <missing> |
---------------------------

I really think you should reconsider adopting my approach on this issue.

apjanke commented 5 months ago

I don't think my string is missing missing values. It's just that string({''}) is not how you create them. You need to use string(missing) or NaS instead. (string(missing) is the Matlab way of doing it; NaS is my own extension.)

There was a bug in string.missing that caused it and NaS to return "" instead of missings; find a fix here: https://github.com/apjanke/octave-tablicious/commit/92602102a6913b13b4dc091e4e02bb2df9ab16d9.

I agree there's a case for making a "preallocated" table like you're referring to, including one populated with missings. I just don't think a change to the string constructor semantics is required for that.

I took a stab at my own implementation of the table "preallocation" constructor over here on branch WIP/preallocation-ctor-form. Have a look and see what you think.

One thing is that Matlab's table() constructor, for the Size/VariableTypes preallocation form, defines the fill values for variables differently from array expansion, and they're not the missings you're asking for here. Here's the doco in the table() help page. In particular, the fill value used for string is "", not string(missing). And the default for double is 0, not NaN.

image

For Matlab compatibility, I made my table('Size', 'VarTypes') constructor match Matlab's fill behavior.

I think there's a case for making preallocated tables that use NaN and string(missing) as fill values instead of 0 and "". I think that way actually makes more sense, and is less error-prone. To support that, Tablicious could provide a static method or extension to the constructor arguments to support that. My implementation on that branch has the code to do so, but it's not part of the public interface yet; it's in tblish.table.internal.blankTable. (Packages named +internal are by convention (at least in Matlab) internal-use stuff that's not part of the public API of a piece of code, even though they have public access at the language level.)