Invalid optimisation or bug in DLB/DAB?

ZornsLemma commented 6 years ago

If I tweak hello.pla to look like this:

include "inc/cmdsys.plh"
byte foo
puts("Hello, world.\n")
foo = 255
foo = foo + 1
puti(foo)
done

and then run it using the portable VM:

$ make hello
./plasm -AMOW < samplesrc/hello.pla > samplesrc/hello.a
acme --setpc 4094 -o HELLO#FE1000 samplesrc/hello.a
./plvm HELLO
Load module HELLO
Hello, world.
256

note that it outputs 256, not 0 , even though foo is a byte variable. The generated code looks like this:

    !BYTE   $2A,$FF         ; CB    255
    !BYTE   $7C         ; DAB   _D028+0
_F001   !WORD   _D028+0     
    !BYTE   $0C         ; INCR
    !BYTE   $7C         ; DAB   _D028+0

I am not sure what's the right fix:

disable the optimisations which rely on replacing SLB [n]:LLB [n] with DLB [n] (and same for other variants e.g. DAB)
alter DLB (DAB etc) so it does have the same effect as SLB:LLB by zeroing the high byte

Let me know what you think; if you'd rather change the optimiser I'm happy to try preparing a pull request for that.

dschmenk commented 6 years ago

I'm actually torn about this. Personally, I'm Ok with it as an optimization artifact. In fact, there is another subtle issue when using byte values as FOR/NEXT iterators:

byte i

for i = 0 to 1000
  puti(i)
next

actually iterates the proper number of times. The reason is that the iteration count is left on the stack during the FOR/NEXT and is tested for the termination value, not the value of i in memory.

Instead of disabling it altogether, it could be one of the optimizations that gets a documented caveat. Perhaps making it part of the try_dupify() optimizations when -O2 is passed in.

ZornsLemma commented 6 years ago

I'm pretty torn myself. It seems a shame to leave this trap lying around for future developers, but if it's documented it's probably not such a huge problem - I was pretty surprised by this and was wondering why my code wasn't working, but I hadn't had any warning from the documentation. It also seems a shame to lose the optimisation or to force DLB to clear the high byte - the 'for' loop example is both interesting and useful, if it's documented as behaviour which can be relied on.

I've been hoping I'd come up with a fourth option but so far it has eluded me...

(I think I'd missed the fact that the compiler now has multiple optimisation levels - is this just the self-hosted compiler? I guess the cross-compiler could be modified to have this too. That does make it a feel a bit better to include a 'risky' optimisation.)

dschmenk commented 6 years ago

We can try clearing the MSB of the eval stack for DAB/DLB. I never documented the FOR/NEXT behavior of byte values. The more I think about it, the more Ok I am with clearing the MSB.

Oh yeah, I forgot I only added the -O2 to the self hosted compiler. It includes the try_dupify() in the optimization passes. I can add it to the cross-compiler too.

ZornsLemma commented 6 years ago

If you're not worried about losing the FOR/NEXT behaviour then I think clearing the MSB in DAB/DLB is a good solution. The only thing that slightly concerns me - and I haven't tried to test this - is what would happen with a loop like:

byte i
for i = 0 to 255
     // do stuff
next

Would that fail due to wrapping with the proposed truncation, maybe turning into an infinite loop? I think a case could be made that this isn't unreasonable behaviour anyway; it's at least a bit more obvious than the case where the optimiser happens to preserve the MSB for you unexpectedly.

dschmenk commented 6 years ago

I'm fairly sure that will work. I'll make the change to the devel branch (I'm quite sure it will work there) and look for any anomalies.

dschmenk commented 6 years ago

I added the clear MSB for DLB/DAB in the devel branch and only found one instance of bad code - in ROGUE of all places. A FOR/NEXT that used a byte variable but iterated through negative values.

In master, I found some problems with FOR x = 100 DOWNTO 0 cases. That seems like a useful instance. Should I check in the fix to master? I'm thinking we just fix this in the devel branch.

ZornsLemma commented 6 years ago

Thanks Dave, I'm quite happy for this to just go to devel. I've moved my own development work over to a branch off your devel branch, and I don't think the Acorn port has (m)any actual users so I'm not planning to push out an update of the previous release with this change in. It also seems safer to make a change to the VM behaviour only in the new 2.0 version.

dschmenk commented 6 years ago

Ok, I'm closing this out as fixed in the devel branch.

dschmenk / PLASMA

Invalid optimisation or bug in DLB/DAB? #43