(suggest) Adding a syntax regression test script to the github code

Dunbaratu commented 10 years ago

Please let me know what you think of this idea, especially @marianoapp and @erendrake

When I implemented the ELSE operator (issue #6), it was my first attempt to alter the syntax definition, and I was nervous about breaking something so I checked it afterward with the following kosscript designed to try to cover every possible permutation of IFs and ELSEs that I could think of, including trying to regression-test for things that shouldn't have changed about it:

//KOS
//  A script with the purpose of testing that 'if' and 'else' work right:

set a to 10.
set b to 20.
set c to 40.

print "IF,  one line body, true check:".
print "is " + a + " = 10?".
if a = 10
  print "  yes.".
print "IF,  one line body, false check:".
print "is " + b + " = 10?".
if b = 10
  print "  yes.".

print "IF,  two line body requiring braces:".
print "is " + a + " = 10?".
if a = 10 {
  print "  yes.".
  set doNothing to 1. // just to make a reason for the braces.
}.

print "if/else, one-liners, true check:".
print "is " + a + " = 10?".
if a = 10
  print "  yes".
else
  print "  no".

print "if/else, one-liners, false check:".
print "is " + b + " = 10?".
if b = 10
  print "  yes".
else
  print "  no".

print "if/else, multi-line bodies, true check:".
print "is " + a + " = 10?".
if a = 10 {
  print "  yes".
  set doNothing to 1. // just to make the body longer than 1 stmt.
} else {
  print "  no".
  set doNothing to 1. // just to make the body longer than 1 stmt.
}.

print "if/else, multi-line bodies, false check:".
print "is " + b + " = 10?".
if b = 10 {
  print "  yes".
  set doNothing to 1. // just to make the body longer than 1 stmt.
} else {
  print "  no".
  set doNothing to 1. // just to make the body longer than 1 stmt.
}.

print "nested if/else, one-liners, hits first term:".
print "what is " + a + " ?".
if a < 15
  print "  less than 15".
else if a < 20
  print "  in the range [15,20)".
else if a < 30
  print "  in the range [20,30}".
else 
  print "  >= 30".

print "nested if/else, one-liners, hits third term:".
print "what is " + b + " ?".
if b < 15
  print "  less than 15".
else if b < 20
  print "  in the range [15,20)".
else if b < 30
  print "  in the range [20,30}".
else 
  print "  >= 30".

print "nested if/else, one-liners, hits final else:".
print "what is " + c + " ?".
if c < 15
  print "  less than 15".
else if c < 20
  print "  in the range [15,20)".
else if c < 30
  print "  in the range [20,30}".
else 
  print "  >= 30".

print "nested if/else, mix of one-line and braces, hits middle term:".
print "what is " + b + " ?".
if b < 10
  print "  less than 10.".
else if b < 30 {
  print "  in the range [10,30}".
  set doNothing to 1. // pointless line to force need for braces.
} else {
  print "  >= 30".
  set doNothing to 1. // pointless line to force need for braces.
}.

It occurs to me that expanding a thing like that into a very complex long script full of WHENs, LOCKs, FORs, UNTILs, Program calls, and so on might be a good idea for trying to exercise a bit of everything in the language, so that after a change you could run the test and compare the results to the results of a previous run, and look at the diff between them to make darn sure that any change you see is a change you meant to see. Furthermore, when a new bit of syntax is added, you could update the regression test script to add checks for your new thing. Then when you diff it, you expect those changes for your new syntax, but are also making sure nothing previously working became broken.

If people think this is a good idea I would be happy to try to work out a starting version of it. The idea would be that it would be added to the github repository and using it to check any language feature update would be a standard practice people should get in the habit of. It would be a thing you could do as part of a check to see whether or not to accept a pull request.

Here's a few design bullet-points about how I picture it working:

To make a test run, you'd put the script in your archive volume, put a tiny do-nothing craft on the launchpad with just a battery and an SCS part and a command core on it, and open the terminal and run it.
Unlike in my example above, the script would need to print everything out to a log file, NOT to the screen. (it would be using LOG commands). This is so you'd have a file of the output on your computer that you can look at with an external diff tool.
It would be necessary that it be run from the archive because it might blow past the 10,000 byte limit, and because that way the LOG command ends up giving you a disk file, rather than a bunch of lines in a section of your persistence file.
In order to test the calling of subprograms, it would actually be more than one program, not a single script. Therefore it may as well be split up into logically consistent topics, one per script, that all get called from a master script. i.e. "this is the program that tests loops" "this is the program that tests when and on triggers", and so on.
The test would be incapable of finding problems in the steering or controlling or ship data interactions. It's just for testing the language features only. This is because you can test all language features from a craft sitting on the ground of the launchpad without needing to put it in a specific situation, but once you want to test things like throttle control or reading fuel levels, then you'd have to actually fly the craft into different situations and that's too ugly to deal with in an automated test that's only supposed to take a few seconds to give you your answer.

erendrake commented 10 years ago

We NEED regression testing. I have been looking at moving all of the code that doesn't directly depend on KSP into another assembly so we could add a unit testing framework.

Dunbaratu commented 10 years ago

While I like your idea of a separately compilable subset of the code that doesn't depend on KSP for the sake of testing the language, I imagine that taking a lot of work to set up, and a lot of redesign. What I was thinking of is doable now even with the KSP API calls intertwined with the rest of the code. Just put a SCS module on the launchpad in-game and use it to run the test script and spit out a LOG in the archive.

Once you have the output as a LOG in the archive, then external programs can perform the diff checking against previous results and prepare a report on what changed.

erendrake commented 10 years ago

Sorry, i didnt mean to say that i didnt like your idea in the OP. In fact i think we should do both of these in the long run and yours can be done now. I always vote YES! for regression testing :)

Dunbaratu commented 10 years ago

I started looking into this yesterday for a short while and I think this is the design I'm going to go with. Let me know what you think:

In the KSP-KOS/KOS project on git, I add a directory called kos_test. It contains this:

KOS
  |
  +-- kos_test
        |
        +-- script
        |     |
        |     +-- kostest_syn_all.txt
        |     +-- kostest_syn_ifelse.txt
        |     +-- kostest_syn_progcalls.txt
        |     +-- kostest_syn_onwhen.txt
        |              (etc for all test scripts I come up with)
        +-- out
              |
              +-- kostest_syn_out.txt

To run a test:

Copy all the files in script from your local git branch to your KOS archive.
In kOS, run kostest_syn_all. It in turn will run all the other kostest_syn* programs.
All kostest_syn programs are designed to show their output with LOG stuff TO kostest_syn_out.txt instead of with PRINT statements. Running all of them concatenates their output to the one big combined file.
If the kostest programs didn't complete then the test proves something broke right there.
If the kostest programs do complete, then copy the output kostest_syn_out.txt from the KOS archive back to your local git copy under the location kos_test/out/kostest_syn_out.txt
Now that you have it in your local git, you can use git's own diff tools to compare the result to git's record of what the previous result was to see what changed, and eyeball that change to make sure it was a change you expected.
After everything is working, you commit the changes to kostest_syn_out.txt to git, and perhaps any changes to the testing suite itself that you made (the programs in script/) to go with your new change, That way when your merge the branch back in to the master, the new test suite and its expected output results gets committed too. Then when other people make syntax-affecting edits, they can do the same thing - diff what they get versus what's in git's record of the previous results.

I was wondering how to diff the results versus the previous results and how to track a history of previous results when I had a head-slap moment and realized that everything needed for that is already part of the git tools, if I just make the test results into a git-tracked file.

marianoapp commented 10 years ago

The code was designed to be very modular and as such you can execute a lot of parts from outside of KSP. As a matter of fact I wrote a small program that emulates the console to debug the compiler and test the grammar. Given that, it wouldn't be too difficult to write test cases using a framework like NUnit to try and compile every instruction and compare the result with the expected output. That along would prevent most issues resulting from modifications to the compiler. Since you can also execute the CPU from outside of KSP is possible to test the execution of more complex structures like loops, lists, waits, etc. And we could also setup a Continuous Integration system (like TravisCI) to run the test suite on every commit.

Of course a lot of other things have to be tested from inside of KSP and for that we can have a couple of scripts, but I don't really like the idea of committing the results, it would create to much bloat.

Dunbaratu commented 10 years ago

I don't see why it's a bad idea to commit the results. I think it makes sense to use git's own tools to store "these are the results that are expected with this particular version of this particular branch."

erendrake commented 10 years ago

I have two issues with commiting results this way.

There are other tools that i believe are better suited to running these tests and displaying the results (Continuous Integration)
It will fill the git log with many changes that are not actually code

This is why i split the DOC project into its own thing, for better or worse.

Dunbaratu commented 10 years ago

I do agree that it would put things in the git log that are not actually code. But that very same feature you're seeing as a bad thing is what I see as a good thing. "This edit to the documentation refers to a new thing that didn't exist yet until release 0.13, that's why this particular edit to this file in the documentation is part of release 0.13's files." It also helps enforce the mentality that end-user docs and test data are part of the programming project, not side projects that can be ignored.

Whether there are other tools better suited to testing is a separate issue entirely from whether or not the testing data (regardless of what testing tool the data is for) should be in the git archive under revision control or not.

As for the documentation, the only reason it sort of makes sense to keep it separate is because of a limitation of Middleman system its written on - that it insists upon putting docs in branches it wants to name its way. It's not because docs in the code repository are a bad idea in general - they're just bad in this particular case because of that artificial limitation of the tool we're using.

erendrake commented 10 years ago

@marianoapp i have noticed that much of the code could be moved into another project with no dependence on UnityEngine and that would be pretty cool. How do you think that will work with the CPU when the CPU has a reference to SharedObjects?

The biggest issue with TravisCI or any other CI tool is that we cannot redistribute the UnityEngine or Assembly-CSharp DLLs that we need to build it.

@Dunbaratu I now feel silly for conflating the Docs and the Test Results i was apparently overtired. I wouldnt mind bringing DOCS back into the main project. It was likely a mistake to move it out. The middleman path/branch issue is not a huge deal to fix :)

A CI tool is not just a good place for testing, It is a better place for testing data. Crufting up the repo with hundreds of files is bad enough but the real trouble with storing the results is that they would almost always be in a separate commit once we have CI. I would prefer to keep CI out of the business of adding anything other than tags to the repo.

Dunbaratu commented 10 years ago

So how do other collaborators on the project see whether or not their change caused an unexpected difference from how things worked before when you won't include the data for the testing tool in the repo? That's a fundamental question about the testing that is not dependent on which tool you use.

erendrake commented 10 years ago

a website and/or an email. They could also always run the tests locally.

the tests would always be in the repo. The results would not.

Dunbaratu commented 10 years ago

What good is automating the testing when the testing tool is unaware of what the intended results are because they're not in the files downloaded from the repo?

erendrake commented 10 years ago

I am saying that Everything needed to run the test and determine their fitness is in the Repo. The actual pass/fail or scalar result from running the tests would stay on each box and the CI environment would have a place for you to see if something failed in an automated way.

Are we talking about the same thing?

Dunbaratu commented 10 years ago

I think you misunderstood what I was asking for then. When I was talking about storing the results in the repo it was because that was how the primitive method I was describing would record "this is the definition of a 'pass' result - it's a pass if the output match this." The reason for it being in the repo is because the definition of a 'pass' can change when features are changed with future releases, so you put the definition of a 'pass' under revision control so it can match the software revision in question that its part of. As long as the software change being merged in wasn't supposed to change the expected test result, the test result files in question remain unaltered and therefore not part of the pull request. They only get updated when the developer deliberately changed them to match a new expected result, or when a new test case was added to them - just like any other source file works.

I don't see how using a different tool makes it a bad idea to store the "this is the expected result of this test" data so that it's recorded alongside the code changes in the repo. The format of such data will be different with a sophisticated automated tool, but it will still need some sort of data that says "you'd better expect this as the result", and that, whatever form it takes, should be under the repo control so it can be updated and versioned with everything else.

erendrake commented 10 years ago

@Dunbaratu do you still have these tests? i was writing a similar test for the new quicksave stuff im working on and i remembered this conversation.

Dunbaratu commented 10 years ago

somewhere. I can look for them late tonight

abenkovskii commented 9 years ago

Isn't it simpler to use a custom TELNET client?

abenkovskii commented 9 years ago

Using it we'll be able to test exceptions thrown too.

MaraRinn commented 5 years ago

Did you find the scripts?

Is this something that might be worth pursuing as a regression/integration test for the KSP 1 to KSP 2 migration?

KSP-KOS / KOS

(suggest) Adding a syntax regression test script to the github code #39