codereport / jsource

J Language Source Code. Livestream links ⬇️
https://www.youtube.com/playlist?list=PLVFrD1dmDdvfVhYLU_iKkV67X9XqCJLWe
Other
38 stars 20 forks source link

Refactor `zeroioni` #87

Open codereport opened 3 years ago

codereport commented 3 years ago

In jsrc/verbs/monadic/tally.cpp:

// TODO: refactor me
[[nodiscard]] auto
refactorme_zeroionei(int64_t n) {
    return reinterpret_cast<array>(Bnum + (n));
}
herwinw commented 3 years ago

Just to save the next person some time: this code is currently in jsrc/array.hpp.

To add a bit of documentation: the current implementation (without the typecase) is:

Bnum + (n)

The definition of Bnum can be found in j.c:

I Bnum[22][8] =                                                                                                  
  {  // the numbers we keep at hand.  0 and 1 are B01, the rest INT; but the first 2 are integer forms of 0 and 1
    CBAIVAL(INT, 0),  CBAIVAL(INT, 1),  CBAIVAL(INT, -10), CBAIVAL(INT, -9), CBAIVAL(INT, -8), CBAIVAL(INT, -7), 
    CBAIVAL(INT, -6), CBAIVAL(INT, -5), CBAIVAL(INT, -4),  CBAIVAL(INT, -3), CBAIVAL(INT, -2), CBAIVAL(INT, -1), 
    CBAIVAL(B01, 0),  CBAIVAL(B01, 1),  CBAIVAL(INT, 2),   CBAIVAL(INT, 3),  CBAIVAL(INT, 4),  CBAIVAL(INT, 5),  
    CBAIVAL(INT, 6),  CBAIVAL(INT, 7),  CBAIVAL(INT, 8),   CBAIVAL(INT, 9)};                                                                                                                                                

Related to this we have two constants in j.h:

#define NUMMAX 9          // largest number represented in num[] 
#define NUMMIN (~NUMMAX)  // smallest number represented in num[]

The value of NUMMIN is actually -10 (see Godbolt, but there must be a better way to visualize this). This matches the min/max values of Bnum. This looks to me like these are some kind of static/singleton values: instead of creating a new A every time you need these number, one could just get a pointer to a static value.

Possible refactors I could think of:

herwinw commented 3 years ago

After reading some more code: the definition of num and refactorme_num is:

#define num(n) ((A)(Bnum + 2 + (n)-NUMMIN))

[[nodiscard]] inline auto                                 
refactorme_num(int64_t n) {                               
    return reinterpret_cast<array>(Bnum + 2 + n - NUMMIN);
}                                                         

These two definitions behave the same, I'm going to use num from here, that's easier to type, but this would apply to our C++ function as well.

Our caller is (slightly rewritten to show the relevant lines more clearly):

// this is for "creating an integer atom with value k"
template <typename T>
auto make_scalar_integer(J jt, T k) -> array {
    if (xor_replicate_sign(k) <= NUMMAX) {
      return !zero_or_one(k) ? num(k) : zeroionei(k);
    }
    array z = make_array<T, copy_shape_0>(jt, 1, 0);
    set_value_at(z, 0, k);
    return z;
}

As we've seen before, we can only use num and zeroionei if we're in the range of [-10, 9], or [NUMMIN, NUMMAX]. The line with xor_replicate_sign decides whether we use a value from Bnum, or create a new A. This means that check can only mean "is k in the range [NUMMIN, NUMMAX]. If it isn't, we have to create the scalar ourself. It looks like num(k) does not work with the values 0 and 1, so we have to check that as well, and decide whether we need to call zeroionei or num. This also means the boundary check of zeroionei described above is incorrect, it should only accept two values (which is kind of implied by the name, which I guess is supposed to be read as zero I one I).

The lookup of num now actually starts to make sense. Some examples:

#define num(n) ((A)(Bnum + 2 + (n)-NUMMIN))
#define num(n) ((A)(Bnum + 2 + (n) + 10)) // We've already established that `NUMMIN` equals `-10`
#define num(n) ((A)(Bnum + 12 + (n)) // The most simplified version

The negative numbers are -10 to -1 map to [2, 11], that fits the indexes of Bnum. Values 0 and 1 are special, those are the first two items, the values at index 12 and 13 are the booleans instead (there is probably a reason for that, but I can't think of anything that would make sense). Last are the positive values 2 to 9, mapped to indexes [14, 21], and once again we have a match.

herwinw commented 3 years ago

Refactors I would suggest:

juntuu commented 3 years ago
should actually be `A[22]` instead of `I[22][8]`

A is typedef for AD*, so this would be array of 22 pointers, without room for the data. The correct type would be AD[22]

Also, just a note that this relies on sizeof(AD) == sizeof(I[8])

I'd guess one reason for I[8] instead of AD has been the initialisation: first has always 8 members to initialise, the latter has 10 now, but one seems to be 64 bit specific filler, so it would've been 9 on some platforms.

Now, that 64 bit is only supported platform the initialisation of the struct would be just fine, although it needs some more braces around the subobjects as well.

herwinw commented 3 years ago

Yeah, I messed up AD and A in that one. I think an issue here is the values are generated by a macro:

#define CBAIVAL(t, v) \
    { 7 * SZI, (t)&TRAVERSIBLE, 0, (t), ACPERMANENT, 1, 0, (v) }

I guess we're using this to construct the AD ourselves, I guess this one could directly produce an AD as well.

herwinw commented 3 years ago

I guess this case is done, that function no longer exists. The case does include some scribbles of documentation that may be worth saving, but I guess the wiki would be a better place to save that (but the wiki is currently disabled, I'm not sure whether that's on purpose)