Negative division for Byte->Word calculations

This issue relates to how relative offsets specified as immediate values are handled after my recent PR (#18).

After my recent PR (#18) was merged - thank you btw - I noticed that we can now have "negative 0" in the JUMP{R,S} instructions (i.e. offset=0 and sign=1).

This happens when the offset in bytes is between -1, -2 or -3 and is converted to words. During the conversion (divide by 4) the result is effectively "rounded" to 0 (correct), but the sign bit in the jump instruction is still set as negative.

I assume this does not matter to the ULP, but it felt imperfect to me anyway.

Digging deeper I noticed that while my PR handles immediate offsets exactly how I would expect (except for the -0 possibility), I got to this behaviour "by accident".

The accident/mistake was to not wrap stp in brackets in the I_JUMP_REL{R,S} macros (see here). When the macro gets passed step_val>>2 (see here), -stp in the macro actually expands to -step_val>>2 and the unary minus has higher operator precedence than the shift-right (i.e. it is evaluated before the shift-right). This results in a positive number being right-shifted, instead of a negative.

The matters because with negative numbers, shift-right and division behave differently (apparently right-shift on negative numbers is not even well defined, rather it's "implementation specific"). This all raises the question of "how should it work"?

To show the issue (difference between divide and shift):

Divide          Shift-Right
5 / 4 = 1       5 >> 2 = 1
4 / 4 = 1       4 >> 2 = 1
3 / 4 = 0       3 >> 2 = 0
1 / 4 = 0       1 >> 2 = 0

-1 / 1 =  0    -1 >> 2 = -1  #different
-3 / 4 =  0    -3 >> 2 = -1  #different
-4 / 4 = -1    -4 >> 2 = -1
-5 / 4 = -1    -5 >> 2 = -2  #different

The current behaviour matches my expectation, which is based on how the ULP interprets the offset field. The ULP uses bit 7 from the offset field to determine direction (sign), while bits 0-6 are the absolute offset, so I would expect positive and negative offsets to result in the same absolute offset (magnitude), with only the sign bit being different.

So, how should it work? I notice that all places in the code, which convert offsets from bytes to words, use the shift-right approach, rather than divide by 4. The documentation describes what the assembler does as "converts bytes to words", so it appears the intent would be to "divide by 4" rather than to "shift something".

I did find some other code, which explicitly checks for "multiple of 4" (here. That could be another solution: simply disallow values that are not multiples of 4.

As I write and test this, I realise that this all might not be very important. When a negative input is exactly divisible by 4, the result will be identical (correct) for both the shift-right and division approach. It's only for not-a-multiple-of-4 values, where the "rounding" is done differently. And since only word-aligned addresses are actually addressable, real programs should only use offsets that are actually multiples of 4. So perhaps leaving this case entirely undefined is perfectly fine? I dislike undefined though.

So, to summarise: This is mostly about understanding (deciding) how the assembler should behave with negative offsets. The options I see:

Leave things as they are. It works (at least as I would expect it - except for to possibility of -0).
Change all places where bytes are converted to words from >>2 to /4 operations. This should express the intent better, and the compiler can optimise these as necessary/possible.
Add a check to functions, which generate jump instructions (such as esp32ulp_cmd_jump_relr_esp32), to only accept step_val inputs that are multiples of 4.
"Fix" my recent PR (#18), by wrapping stp correctly in brackets, so that negative numbers are right-shifted (resulting in a slightly different result compared to divide-by-4, as shown above). Perhaps there is a reason why this would be more correct?
Do option 2 and option 5 above. This would get the divide-by-4 behaviour, and might avoid surprises to others down the line, with what the macro does.

@dmitry1945 - since you merged my recent PR (#18), perhaps you have this freshest in memory and I could ask you to comment? I am happy to create another PR to adjust the behaviour if it should be different to what it is now.

espressif / binutils-esp32ulp

Negative division for Byte->Word calculations #21