Refactor `zeroioni` - Githubissues

codereport commented 3 years ago

In jsrc/verbs/monadic/tally.cpp:

// TODO: refactor me
[[nodiscard]] auto
refactorme_zeroionei(int64_t n) {
    return reinterpret_cast<array>(Bnum + (n));
}

herwinw commented 3 years ago

Just to save the next person some time: this code is currently in jsrc/array.hpp.

To add a bit of documentation: the current implementation (without the typecase) is:

Bnum + (n)

The definition of Bnum can be found in j.c:

I Bnum[22][8] =                                                                                                  
  {  // the numbers we keep at hand.  0 and 1 are B01, the rest INT; but the first 2 are integer forms of 0 and 1
    CBAIVAL(INT, 0),  CBAIVAL(INT, 1),  CBAIVAL(INT, -10), CBAIVAL(INT, -9), CBAIVAL(INT, -8), CBAIVAL(INT, -7), 
    CBAIVAL(INT, -6), CBAIVAL(INT, -5), CBAIVAL(INT, -4),  CBAIVAL(INT, -3), CBAIVAL(INT, -2), CBAIVAL(INT, -1), 
    CBAIVAL(B01, 0),  CBAIVAL(B01, 1),  CBAIVAL(INT, 2),   CBAIVAL(INT, 3),  CBAIVAL(INT, 4),  CBAIVAL(INT, 5),  
    CBAIVAL(INT, 6),  CBAIVAL(INT, 7),  CBAIVAL(INT, 8),   CBAIVAL(INT, 9)};

Related to this we have two constants in j.h:

#define NUMMAX 9          // largest number represented in num[] 
#define NUMMIN (~NUMMAX)  // smallest number represented in num[]

The value of NUMMIN is actually -10 (see Godbolt, but there must be a better way to visualize this). This matches the min/max values of Bnum. This looks to me like these are some kind of static/singleton values: instead of creating a new A every time you need these number, one could just get a pointer to a static value.

Possible refactors I could think of:

The parentheses around n are copied from the original C macro, and are not needed here.
Bnum is an array, Bnum + n is actually an array index, writing it like Bnum[n] makes that a bit cleaner
We might want to do a boundary check: there are 22 items, so n should never be bigger than 21
The index is based on the internal bit representation of J. To get the value of 2, you'd have to input 14.
Do we want to combine this with refactorme_num? Our caller written as return !zero_or_one(k) ? refactorme_num(k) : zeroionei(k), which means we use Bnum for 0 and 1, but not for the other integers available in Bnum. And we would have to do this at every call site. It might be easier to make refactorme_num check if the input is in the min/max boundaries of Bnum and return these static As, or create a new A otherwise.
It kind of looks to me like Bnum has an incorrect type, and should actually be A[22] instead of I[22][8]. This would get rid of the casting, but we would have to validate all users of Bnum.

herwinw commented 3 years ago

After reading some more code: the definition of num and refactorme_num is:

#define num(n) ((A)(Bnum + 2 + (n)-NUMMIN))

[[nodiscard]] inline auto                                 
refactorme_num(int64_t n) {                               
    return reinterpret_cast<array>(Bnum + 2 + n - NUMMIN);
}

These two definitions behave the same, I'm going to use num from here, that's easier to type, but this would apply to our C++ function as well.

Our caller is (slightly rewritten to show the relevant lines more clearly):

// this is for "creating an integer atom with value k"
template <typename T>
auto make_scalar_integer(J jt, T k) -> array {
    if (xor_replicate_sign(k) <= NUMMAX) {
      return !zero_or_one(k) ? num(k) : zeroionei(k);
    }
    array z = make_array<T, copy_shape_0>(jt, 1, 0);
    set_value_at(z, 0, k);
    return z;
}

As we've seen before, we can only use num and zeroionei if we're in the range of [-10, 9], or [NUMMIN, NUMMAX]. The line with xor_replicate_sign decides whether we use a value from Bnum, or create a new A. This means that check can only mean "is k in the range [NUMMIN, NUMMAX]. If it isn't, we have to create the scalar ourself. It looks like num(k) does not work with the values 0 and 1, so we have to check that as well, and decide whether we need to call zeroionei or num. This also means the boundary check of zeroionei described above is incorrect, it should only accept two values (which is kind of implied by the name, which I guess is supposed to be read as zero I one I).

The lookup of num now actually starts to make sense. Some examples:

#define num(n) ((A)(Bnum + 2 + (n)-NUMMIN))
#define num(n) ((A)(Bnum + 2 + (n) + 10)) // We've already established that `NUMMIN` equals `-10`
#define num(n) ((A)(Bnum + 12 + (n)) // The most simplified version

The negative numbers are -10 to -1 map to [2, 11], that fits the indexes of Bnum. Values 0 and 1 are special, those are the first two items, the values at index 12 and 13 are the booleans instead (there is probably a reason for that, but I can't think of anything that would make sense). Last are the positive values 2 to 9, mapped to indexes [14, 21], and once again we have a match.

herwinw commented 3 years ago

Refactors I would suggest:

In make_scalar_integer, replace the check xor_replicate_sign(k) <= NUMMAX with something more descriptive. The check itself can be reduced to NUMMIN <= k && k <= NUMMAX, this can be changed into a function if needed
Move the logic of the next line to refactorme_num, callers should not have to worry about that weird boolean exception
Check the code for calls with num(0) and num(1), those are actually booleans. I would prefer to move those out of Bnum and into Bbool, but this would require us to update all current C callers as well. I would define a separate macro to get those booleans.
If that boolean change is feasible, we could update num to work with the values 0 and 1.
Once we've finished that, calls to zeroionei can simply be updated to use num instead.
Then apply the ultimate refactoring: Remove Dead Code

juntuu commented 3 years ago

should actually be `A[22]` instead of `I[22][8]`

A is typedef for AD*, so this would be array of 22 pointers, without room for the data. The correct type would be AD[22]

Also, just a note that this relies on sizeof(AD) == sizeof(I[8])

I'd guess one reason for I[8] instead of AD has been the initialisation: first has always 8 members to initialise, the latter has 10 now, but one seems to be 64 bit specific filler, so it would've been 9 on some platforms.

Now, that 64 bit is only supported platform the initialisation of the struct would be just fine, although it needs some more braces around the subobjects as well.

herwinw commented 3 years ago

Yeah, I messed up AD and A in that one. I think an issue here is the values are generated by a macro:

#define CBAIVAL(t, v) \
    { 7 * SZI, (t)&TRAVERSIBLE, 0, (t), ACPERMANENT, 1, 0, (v) }

I guess we're using this to construct the AD ourselves, I guess this one could directly produce an AD as well.

herwinw commented 3 years ago

I guess this case is done, that function no longer exists. The case does include some scribbles of documentation that may be worth saving, but I guess the wiki would be a better place to save that (but the wiki is currently disabled, I'm not sure whether that's on purpose)

codereport / jsource

Refactor `zeroioni` #87