apjanke / octave-tablicious

Table (relational, tabular data) implementation for GNU Octave
https://apjanke.github.io/octave-tablicious/
GNU General Public License v3.0
28 stars 11 forks source link

element deletion in string array #100

Closed gllmflndn closed 5 months ago

gllmflndn commented 1 year ago

From #98, removing an entry in a string array raises an error:

>> a = string ("octave");
>> a(1) = [];
error: =: nonconformant arguments (op1 is 1x1, op2 is 0x0)
error: called from
    subsasgnParensPlanar at line 1093 column 33
    subsasgn at line 1052 column 16
apjanke commented 1 year ago

Thanks! I may have time to work on this over this coming weekend, but don't hold your breath. If this is breaking an actual use case for you, let me know specifically what it is, and I'll see if I can come up with a workaround for you.

gllmflndn commented 1 year ago

I only came across this when trying to create an empty string array - as I was unaware of string ([]) so there is no emergency.

Looking at your code, it seems that a special case needs to be made if rhs is [] in subsasgnParensPlanar, e.g.:

     function this = subsasgnParensPlanar (this, s, rhs)
       %SUBSASGNPARENSPLANAR ()-assignment for planar object
-      if ~isa (rhs, 'string')
-        rhs = string (rhs);
+      if ~isa (rhs, "string")
+        if (isa (rhs, "double") && isequal (size (rhs), [0 0]))
+          rhs = struct ("strs", [], "tfMissing", []);
+        else
+          rhs = string (rhs);
+        endif
       endif
       this.strs(s.subs{:}) = rhs.strs;
       this.tfMissing(s.subs{:}) = rhs.tfMissing;
     endfunction
apjanke commented 1 year ago

Yup, I suspect you're right here. Looks like the special-case x(...) =[] deletion operation is sensitive to rhs type and size. From Matlab:

>> strs = ["foo", "bar", "baz", "qux"];
>> a = strs; b = strs;
>> a(2) = []
a = 
  1×3 string array
    "foo"    "baz"    "qux"
>> a(2) = string([])
Unable to perform assignment because the left and right sides have a different number of elements. 
>> a(2) = reshape([], [0 0 2 3])
Unable to perform assignment because the left and right sides have a different number of elements. 

Don't bother making a PR for this – I want to do some in-depth futzing around with this myself before I apply any change. I'll need to make corresponding changes in some of my other code bases, too. Historically, everything about subsasgn and it's friends have been super fiddly and a huge pain to work with, with subtle edge cases and tricky bits all over the place. Probably need to slap some unit tests on this.

(But if you find any documentation about the exact interface and semantics of the = [] deletion case and classdef overrides for it, please do drop it here, though. :) )

apjanke commented 1 year ago

Bumping this issue to High priority because element deletion is a pretty basic operation.

mgcooper commented 1 year ago

Just mentioning that this same issue applies to element deletion from tablicious datetime arrays. Any plans to address this?

apjanke commented 1 year ago

Not so much "plans" as "hopes": this is a fussy low-level thing that would take substantial work to Do Right and make sure it's correct.

But, given that Tablicious has a low user count and (AFAIK) isn't used in any critical areas, I'm leaning towards "just merge @gllmflndn's fix even without serious testing, and see how it goes". I fiddled around with it a bit, and it "seems" to work, so that's probably better than what we've got now.

What do y'all think?

mgcooper commented 1 year ago

I can't honestly say that I have any idea what is required to address this specific issue, but I think datetime, tables, and strings are three of the best things to come out of matlab for day to day productivity (not performance though!). Categorical really comes in handy too. Would be great to have them fully supported in Octave (but very grateful for the support already offered...). Your package may not have a big user count but I think it might grow. Regarding the fix, again I'm not really in a position to comment, but if there's a working fix I hope you give it a look.

apjanke commented 1 year ago

I think datetime, tables, and strings are three of the best things to come out of matlab for day to day productivity

Thanks! So do I. They've been a core of my Matlab coding work since 2004. (When I implemented my own versions of them years before Matlab introduced theirs. ;) That's what all this Tablicious code comes from, basically.)

So, the fix that gllmflndn proposed above looks like it should work. But unfortunately this - and anything involving subsref/subsasgn and especially type conversion - is fussy low-level stuff that has subtle behavior and error cases, and requires some actual testing.

Your package may not have a big user count but I think it might grow.

Thank you! That's sure my hope.

apjanke commented 5 months ago

Fixed in https://github.com/apjanke/octave-tablicious/commit/95a4ff44f67f5822ce867d65e9cc7b708e606719, I think. It's on the main branch and should go out in v0.4.0 within a couple weeks.

Closing as Fixed.