Failures in rfunge test suite

tjol commented 3 years ago

Hi! I'm working on a Funge-98 interpreter in Rust (rfunge), and I thought I'd run a few other quality befunge interpreters over my test suite – including yours!

My expected outputs are generally generated with either cfunge or CCBI; I do test a few finer points that Mycology doesn't cover.

Funge++ fails a few tests which I rather think it should pass; you can try it yourself using my testing script if you like.

The issues (IMHO) are:

"BOOL" fingerprint: The docs say these are logic functions, but in every other implementation they’re bitwise operations.
Concurrency: Funge++ gets out of sync at some point in concurrent2.b98 – maybe the ; ... ; takes a tick?
"HRTI" fingerprint: my test starts the timer, checks the number of sub-second microseconds with S, and then busy-waits until the next second (as determined using y). Now we know that at least 1000000 - (result of S) μs have passed. We then read the timer, add a small margin of error, and check that enough time has indeed passed. This isn't the most robust test, but even on interpreters where it sometimes fails (like pyfunge), it usually works. On Funge++, it practically always fails...
k;: While there is disagreement about whether 0k;@;1 should skip over ;@; or only ;, most interpreters agree that 1k;@;1 should skip over ;@; and execute the 1 twice, not execute @ (which is what Funge++ currently does)
"MODU" fingerprint: I happen to think the behaviour you have chosen (which is the same as several other interpreters) is mathematically absurd. (see https://github.com/tjol/rfunge/wiki/Fingerprints#modu-0x4d4f4455), but I fully accept that this is my least defensible complaint here
My "REFC" test fails – I suspect this is a concurrency issue rather than a problem with the fingerprint.

I know this is a lot for one "issue" – just thought it might be of some interest! :-)

cwesson commented 3 years ago

@tjol I am a fan of differential testing for compilers, and I had considered doing something similar for Funge-98, so I appreciate your efforts on this.

That is an annoying interpretation of the BOOL spec, but it appears you are correct about other interpreters, so I'll bend to the will of the masses.
I think you are correct about ; ... ; taking one tick, and after reviewing the spec that is not correct, I'll fix that.
I will need to look in to this, can you point me to your test case? What is the behavior you are seeing on Funge++?
The spec is vague about how 1k; should be handled, I think skipping the entire ; .. ; makes more sense logically, but then again Befunge is an absurd language so just skipping the ; could be equally valid. I'll have to think about this one some more.
I assume you are referring to M and U. I agree these are mathematically absurd. I know Funge++'s implementation of M matches other interpreters, and I have never been able to find a definition of "Sam Holden's unsigned-result modulo", so I'm inclined to leave this as is. Although, if your version passes Mycology's MODU test I could be convinced to switch.
I will need to look in to this, can you point me to your test case? It seems likely there could be a concurrency issue, I don't remember if I implemented this as one mapping for each IP or one shared mapping. Given, that C pointers are valid in any thread, it would seem that a single shared mapping is the correct approach, I'll make sure that is what Funge++ does. I assume you are not using -fthreads=native so you probably won't see any issues related to actual thread concurrency, but its possible those exist as well.

tjol commented 3 years ago

"HRTI": This is the test case: https://github.com/tjol/rfunge/blob/master/tests/test_cases/HRTI.b98. It looks like the lower bound on the number of μs passed based on the clock is greater than the actual number of µs measured using the timer. I had a quick look at your code last night and I suspect you’re using a different definition of S from everybody else.
"MODU": M and R are fine, it’s just U that’s ambiguous. As you say, nobody knows who or what Sam Holden is or was. My definition is the same one that’s used by CCBI. The Mycology test for this instruction lets both variants pass – I suspect @Deewiant did this on purpose.
"REFC": this is the test case: https://github.com/tjol/rfunge/blob/master/tests/test_cases/REFC.b98. I haven’t looked at what’s going on in detail. The D in line 2, col 9 reflects, but it should pick up a reference set by the R in line 1, col 13, before the t instruction. PyFunge fails this test as well for whatever reason, but CCBI, cfunge and RC/Funge 2 pass.

Deewiant commented 3 years ago

That is an annoying interpretation of the BOOL spec, but it appears you are correct about other interpreters, so I'll bend to the will of the masses.

I faintly recall making this mistake myself originally, but RC/Funge-98 is where this fingerprint came from and it implemented it as bitwise operations, so that's just how it is. 🤷🏻

"MODU": M and R are fine, it’s just U that’s ambiguous. As you say, nobody knows who or what Sam Holden is or was. My definition is the same one that’s used by CCBI. The Mycology test for this instruction lets both variants pass – I suspect @Deewiant did this on purpose.

I think you're giving me too much credit there. It was ~15 years ago so I obviously can't be sure, but I suspect I just wrote a simple test without thinking about it too much. At the time I wasn't much good at thinking of tests that could catch incorrect implementations. Mycology saw a lot of updates while cfunge was being developed, to catch more corner cases, but I guess we missed this one.

Nice to see some interest in Befunge, btw!

cwesson commented 3 years ago

HRTI: I see other interpreters use S to be microseconds since the last system second, where as I used it to mean microseconds since the last full second from a mark. The spec is ambiguous, but the language is different enough from T that I'm willing to accept I implemented that wrong.
REFC: I'm working on a fix for this, the spec clearly states that the mapping is shared by all IPs. However, your test case assumes that fingerprints loaded by one IP are carried over into child IPs. The spec has nothing to say on the matter, and I suspect your differential testing will show some disagreement on that as well. For what its worth, Mycology tests for both possibilities.
I also noticed your input_reflects test fails. The spec is quite clear that EOF reflects, so I'll fix that as well.

tjol commented 3 years ago

Right, I didn’t think about fingerprints being inherited. I think the REFC test is the only one that actually combines fingerprints and t at the moment, though. Might have to go and check if that’s what makes the test fail on PyFunge as well.

tjol commented 3 years ago

I just tested on your feature/fingerprints branch. Some observations:

input_reflects, BOOL, concurrent2 and k are indeed fixed. Nice!
The HRTI test still fails. This might warrant some closer investigation...
I added an implementation of FRTH yesterday along with a test, which... segfaults on Funge++.

I hadn't noticed that you have FPDP and FPSP implementations in the other branch. There the tests fail because of formatting. How floats should be printed is not specified and really not important, but just so you know most (all?) other implementations use the defaults for C printf(), which is 6 decimal digits. So that’s what I’m imitating (even though that's not the default for float formatting in Rust... or in C++'s iostream of course)

cwesson commented 3 years ago

I believe that covers all the points brought up here. I've opened issues #3 and #4 to address the MODU and REFC issues.

cwesson / funge-plus-plus

Failures in rfunge test suite #1