Spectesting is a minefield

lizmat commented 1 year ago

Spectesting is a version minefield at the moment:

% rak '^use v6' t/spec --frequencies
1758:use v6;
30:use v6.c;
22:use v6.e.PREVIEW;
14:use v6.d;

A more precise:

% rak '{ "t/spec/$/" if .match(/^ <-[#]+[\S]>+ /) }' --/show-item-number t/spec/spectest.data | rak --files-from=- '^use v6' --frequencies
1332:use v6;
17:use v6.c;
14:use v6.e.PREVIEW;
9:use v6.d;

So this shows several issues that each may need a different problem solving issue.

there are many files in roast that are not actually being part of the spectest. Maybe these should be removed from roast or parked in a "archived" branch?
The use of use v6xxx in test files. I think all of the use v6 should be removed. They served a purpose in the days when there could be confusion with Perl, but nowadays that's not needed. And it's a potential problem.
When we currently run a spectest, it is done with the CURRENT state of v6.d, which is not what the next version will be. We should have an easy way to run the spectests for a specific version of Raku, similar to what we're now doing with the RakuAST grammar (aka an environment variable)
What the above doesn't show, is that roast is still using the old .t test extensions. I think these can and should be changed to use the new .rakutest extensions, as the 6.d release test files already live in a separate branch.

JJ commented 1 year ago

Spectesting is a version minefield at the moment:

% rak '^use v6' t/spec --frequencies
1758:use v6;
30:use v6.c;
22:use v6.e.PREVIEW;
14:use v6.d;

A more precise:

% rak '{ "t/spec/$/" if .match(/^ <-[#]+[\S]>+ /) }' --/show-item-number t/spec/spectest.data | rak --files-from=- '^use v6' --frequencies
1332:use v6;
17:use v6.c;
14:use v6.e.PREVIEW;
9:use v6.d;

So this shows several issues that each may need a different problem solving issue.

1. there are _many_ files in roast that are not actually being part of the spectest.  Maybe these should be removed from roast or parked in a "archived" branch?

Wasn't there a file that stated which ones were actually used?

2colours commented 1 year ago

I think there is a bit of confusion about .t extension being "legacy" or "simply stable" - lately it seems that it's more the former, in which case the last point seems simple and straightforward.

For the rest of the questions, I would refer to our discussion at #raku-dev on IRC: a readable form of a test helper like "skip this file if the version doesn't match a certain criterion" would probably help a lot. If I understand right, it would also mitigate the situation of datatest.spec quite a bit: the files themselves could say what version they belong to, which seems like a sane default for spectesting.

vrurg commented 1 year ago

Spectest must not be ran against a single version. Moreover, each test must have an explicit version. Not just use v6 for which, I agree, there must be no place out there anymore. Otherwise, each particular test is designed for a particular language version. Vast majority of them are covering 6.c, apparently. It wouldn't be right to expect that something with no explicit use v6.c/use v6.d will continue to pass when 6.e becomes the default.

2colours commented 1 year ago

I think @vrurg made a good point, perhaps it would be more accurate to say: "one possible good point".

What didn't seem to be clear in the short discussion on the IRC is how the two-dimensional nature of the question is solved:

a spectest belongs to a set of versions of the language (I'm saying it this way because we need to account for deprecations and removals as well!)
an actual compiler/runtime can implement and run different versions of the language

So in theory, this could be an N:M situation - however, only those combinations make sense where the used runtime version is matched by the spectest, so it's much closer to a linear situation. It's worth thinking about the idea of @lizmat as well, namely that the used language version could be set directly, rather than always coming from the source.

If I remember correctly, the topic of removal came up recently, as something that is "possible in theory but a hassle to actually achieve". Anyway, if there may be breaking changes (other than extensions) in the future, it would be good to at least give the specs the chance to describe that phenomenon.

vrurg commented 1 year ago

an actual compiler/runtime can implement and run different versions of the language

What??? It does, mind me...

For the rest of version-related stuff: that's why there are many individual files. Each one can test a specific aspect using a concrete use v6.x pragma. If something is deprecated in 6.e – test it in a 6.e compunit. Period.

2colours commented 1 year ago

What??? I does, mind me...

Not sure what this even means but it's hard to imagine it was anything constructive...

2colours commented 1 year ago

Anyway, what still seems to be missing: what about those spectests that aren't meant to be run, given a newer version, because their right outcome on the newer version is that they fail? Part of the reason I was trying to recite the "removal topic" (that apparently wasn't as interesting as the existence of compiler versions...) - if something is removed, the whole "language versions are scopes around each other" approach breaks down, or at least needs to be hacked. It would be good if at least the specification could deal with this situation in a clear way, period. And this is where both

the separation of the spectest version and used runtime version
and the "skip/fail/pass this test depending on language version in use" would turn relevant and useful.

vrurg commented 1 year ago

Not sure what this even means

It was a mistype. "It does", not "I does".

what about those spectests that aren't meant to be run, given a newer version, because their right outcome on the newer version is that they fail?

Not sure I get you here, but they're supposed to be run with corresponding use v6.x. What is wrong about this statement?

I wonder if you understand the meaning of use v6.x pragma.

2colours commented 1 year ago

I wonder if you understand the meaning of use v6.x pragma.

As much as you explained during the 2021 conference, I guess.

I'm starting to think that we have completely different interpretations of what a run of a spectest means, what the output is. Frankly, the way liz talked about it is much easier for me to relate to.

As I'm taking right now, a "spectest a la vrurg" tests the compiler, regardless of language version, in one run: version c stuff is guaranteed to work in version c mode, version d stuff is guaranteed to work in version d mode and so on. This test treats "introduced in language version c (and may belong to others as well!)" as "belongs to language version c (and only that)". This can only be sound if it's postulated that all language versions MUST include all the previous versions, with a hard stop - otherwise how could we be sure that the same compiler running in e.g "d" mode can still pass tests introduced in language version "c"?

Now, this sort of spectesting makes actual removal of functionality impossible because in order to implement the language version that removed, say, a dozen of public functions and two types, one would still need to implement those versions that introduced and contained them - or else it couldn't even pass the tests that you are proposing, in the first place.

This doesn't follow from the semantics of "use v6.x" in theory, does it? This "language version hierarchy" is more a pragma than a specification demand, right? Okay, that's how it is done in Rakudo, and one wouldn't want to write the same tests for different language versions either, when most of the time there are just no changes - however, the latter is mitigated by a combination of what @lizmat and I proposed (i.e "version under test" coming from a compiler setting, and all tests individually can check the version if different behaviors are needed).

PS: you could write a little longer than three lines, and please keep the bantering bits (like that "compiler versions, no shit" crap) for yourself.

vrurg commented 1 year ago

I haven't read past " version mode". Give yourself an answer what that does mean. Then think of backward compatibility we provide – and you'd be surprised! I just have nothing more to add here.

2colours commented 1 year ago

Thank you for the constructivity again - the only question at this point is who that "we" might be.

niner commented 1 year ago

@vrurg there's actually one thing missing from the current spec test suite and the way we use it: Let's assume there is a test that tests that in Raku 1 + 1 is indeed 2. Since &infix:sym<+> was already part of the first version, that test uses use v6.c;. There is however no test that ensures that this still holds in a comp unit that has a use v6.d; at the top. I.e. we only test that the intentional changes we make in later language versions are present, but don't test the negative: that there are no unintentional changes. Thus a compiler would be free to provide a working version of + in v6.c mode, but even completely remove it from 6.d.

vrurg commented 1 year ago

What constructiveness do you expect? If I tell you that code written 7 years ago with use v6.c we expect to be ran today with, perhaps, just a couple of changes (some code rely on buggy behaviors), if that code used nothing beyond included into the roast back then or today – what would it mean to you? Let me guess, you knew about this paradigm.

If we think further then it becomes clear that the compiler right now supports all language versions: 6.c, 6.d, and the upcoming 6.e. It means that one could use a 6.e module from their 6.d code and it will work. Perhaps not semantically (but it might happen even when the same language version compunits interact), but technically it will. So, all these language versions are in the compiler now.

Then what is that version mode term means? Your statements are based on the term which has no meaning. How could I respond to something that's meaningless in its premise? The only thing which do make sense to me is a test for a language version. A test with use 6.e tests if something works for 6.e and nothing else. If I want to see that something works equally well for all language versions – I would take care to have the code ran under each version individually. It's just a technical matter and nothing else.

The situation is like this: a compiler to claim full Raku support has to pass the roast and support all three language versions.

If we speak about removal of tests then this is something separate and unrelated to versioning.

vrurg commented 1 year ago

Thus a compiler would be free to provide a working version of + in v6.c mode, but even completely remove it from 6.d.

Just about time! :) I've mentioned this case above. It's a pure technical matter of how to implement that kind of testing. For the moment the cleanest way is to EVAL from within individual test files, like, for example:

use v6.e;
EVALFILE "./must-work-everywhere.t";

Or to use is-run test. See, for example, S11-modules/versioning.t. I always wanted EVAL to let its compunit use own language version, but never got there due to lack of time.

And, of course, the option of letting the compiler to use different default language version is on the table too.

But then, again, these are pure technical matters of how to get a task solved. I don't see a problem here.

2colours commented 1 year ago

The situation is like this: a compiler to claim full Raku support has to pass the roast and support all three language versions.

YES, thank you very much, this is the presumption that I also mentioned, have you bothered to read the comment, rather than boasting with not having read it:

This can only be sound if it's postulated that all language versions MUST include all the previous versions, with a hard stop

Let's keep it simple: where is this interpretation of the specification written down, as normative? All Raku compilers should be able to run all Raku code ever written, no matter how obsolete? This sounds debatable - but more importantly, it's not obvious whatsoever. It should be written down at a visible place, rather than deducible from an upset one-liner response.

niner commented 1 year ago

This can only be sound if it's postulated that all language versions MUST include all the previous versions, with a hard stop Let's keep it simple: where is this interpretation of the specification written down, as normative? All Raku compilers should be able to run all Raku code ever written, no matter how obsolete?

I don't think this has ever been written down anywhere. The reason however is not that we didn't commit to it. Rather the opposite: maintaining eternal backwards compatibility is one of those values that sit so deep, that no one felt the need to codify it. It was pretty much a given, i.e. a common understanding and underlies many of the design principles. It's for example the reason why so many things are lexically scoped in Raku. It's one of the values we inherited from the Perl community. The only difference being that we tried to make maintaining that backwards compatibility less painful.

usev6 commented 1 year ago

Back in 2015 @FROGGS and me used a hackathon to work on adding the ability to fudge tests for a specific version of Perl 6 (as it was called then). The idea was to be able to add a version to one or more tests or to a block with tests like this:

#?v6.0.0
is 2+2, 4;

#?v6.0.5+
{
    is "Life, the Universe, and Everything".WHY, 42;
    is 42.WHAT, Int, 'some reason';
}

#?v6.0.0..v6.0.2 2
is 2-2, 0;
is 3*4, 12;

#?v6.0.1..*
is 2-2, 0;
is 3*4, 12;

The above is taken from an old branch that still lives in the roast repo: https://github.com/Raku/roast/compare/master...fudge

I wonder if this idea could still make sense (maybe in a different form).

lizmat commented 1 year ago

Interesting... hmmm

lizmat commented 1 year ago

I would propose the following:

remove all "use v6" specifications: it is clearly that they are meant to indicate Raku rather than Perl, and thus indicate tests that should run ok on any Raku version, including the current development version.
check out whether the files in roast that have a specific version specification (6.c or 6.d) really indicate they need to be run with that version. If not, then remove those as well.
Implement an environment variable RAKU_DEFAULT_LANGUAGE_VERSION= that would indicate the language version with which Raku should run IF there is no specific use v6.x mention.

This should allow us to run all current non-version specific test files with v6.e.PREVIEW without needing to make any (further) changes to roast.

niner commented 1 year ago

I don't really like making changes to rakudo just to avoid making changes for this single user. Instead of that environment variable, you could also have the the Makefile copy the files and prepend the appropriate use v6 statement (except they already have one). After all make is a really good tool when you need to propagate changes of a source to a target.

This would have the advantage of having all those tests be part of a single test run. We could have this full tests run as a different target akin to stresstest.

lizmat commented 1 year ago

Don't you think such an environment variable would have more general applications?

niner commented 1 year ago

In fact I don't :) Can't think of any

usev6 commented 1 year ago

Basically I'm agreeing with @vrurg (from far above):

Moreover, each test must have an explicit version.

Since roast defines Raku, it has to be clear, which test is part of a specific version of Raku.

I wonder if this could -- at least in theory -- be achieved by "just"

keeping different branches of roast (we already have 6.c, 6.c-errata and 6.d-errata),
removing all version annotations from test files
running spectest for a specific version (a specific branch) with a mechanism like @niner suggested (using make to prepend the appropriate statement before actually running the tests)

To me this sounds like a clean separation of concerns. And in theory the released versions of Raku should be immutable, so there shouldn't be much need to adjust test files for old versions.

The most obvious disadvantage would probably be the need to run different spectests (for all versions -- or probably for the corresponding branch 6.?-errata). But if we want to keep backwards compatibility for Rakudo, maybe that's the price? (As a thought experiment, there could be a different implementation that just aims to be compatible with the current version.)

I'm not sure it this idea really makes sense, but the idea came back to my mind each time I looked at the comments in here. So I thought I should mention it ...

2colours commented 1 year ago

As a thought experiment, there could be a different implementation that just aims to be compatible with the current version.

This is what I thought, too, until the following statement happened:

The situation is like this: a compiler to claim full Raku support has to pass the roast and support all three language versions.

I think it's really about time to, well, specify the specification in this regard. When we talked about this with lizmat, the concept of "supported version" also came up - supported, as in the current language specification supports it, and not as in the compiler supports it; the compiler must support it. (Honestly, all of this sounds vague and fragile, without any precedent of removing a version from support.)

lizmat commented 1 year ago

running spectest for a specific version (a specific branch) with a mechanism like @niner suggested (using make to prepend the appropriate statement before actually running the tests)

Although this would be nice for running spectests, it would make debugging issues in a specific test file much harder and error prone. The alternatives are:

adding a use v6.x manually while testing (and then forgetting to remove it)
using a make file for individual testing as well, which may hide issues

Personally, if a test file is fudged, I much rather run the rakudo.moar version directly, optionally tweaking that, than going through a fudging process every time.

Anyways, that's why I think having an environment variable to indicate the language version to be used in absence of a specific indication in a file, would be preferable.

coke commented 1 year ago

Another wrinkle - spec tests were updated to work with Unicode 15 in January - but this breaks the spec test runs for v6.c when using the 2022.12, which only has Unicode 13 support. (sjn++ for reporting)

sjn commented 1 year ago

For me, it would be nice with a visible warning (and pause) if I somehow run the spectests on code that is significantly older than the tests expect – perhaps noting that tests should be expected to fail, since they may be testing features that haven't been implemented in the currently-checked-out (old) code.

vrurg commented 1 year ago

Can't get too deep into this, but so far I like the idea of fudging the most. So, tests which are not explicitly designed for a particular language version would have #?raku-version line as the start. And that is all we need for reliably testing under various use v6.x pragmas.

Raku / problem-solving

Spectesting is a minefield #365