Allow automated testing

mefistotelis commented 9 years ago

Originally reported on Google Code with ID 652

What game elements does the new feature/behavior concern?

Level script, mostly.
The idea is to create a campaign which executes from start to end automatically.
Results of the tests can then be read from keeperfx.log.

Where exactly inside the game should it be accessible?

New script commands should be added to allow execution of automated tests.
- A command which allows passing/failing the test, it is already there (WIN/LOSE_GAME)
- A command which forces the level to end automatically when win/lose state is achieved
- A command which allows determining performance of the test (informs about variables
used as performance factors and their wages)
New command line parameters should be added:
- A parameter which disables any need of user input (--nointeractive)
- A parameter which starts given campaign without menu screen
- A parameter which sets frameskip mode from the very beginning

Under which conditions should it be triggered?

When a test campaign is started.

How exactly should the new feature work?
1. Start KeeperFX with command line to load test campaigns, nointro, nointeractive,
frameskip
2. Allow the tests to be executed without human interaction
3. Summarize data gathered in log files (status(pass/fail) + performance(integer number))

Please provide any additional information below.

Implementing this would be a huge help for:
- avoiding regressions
- improving performance of computer player and creature AI

Reported by mefistotelis on 2015-08-09 23:04:15

mefistotelis commented 9 years ago

Please don't make it a single campaign, but allow for separate test maps and a method
for running several maps(test cases) in a batch. For example by having an ini file
with map-numbers or entering the range of numbers in the command line. Something that
might also work really well with little effort is to have KeeperFX look for a subfolder
‘Testlevels’ when a specific parameter is used and run all the maps there.

Also, you need the option to either abort the whole test suite on the first fail but
more importantly also the option to continue with all the remaining test cases on failure.
 Compare these alternatives:

Abort at first failure:
TC1: Pass
TC2: Fail
TC3: -
TC4: -

Continue at failure:
TC1: Pass
TC2: Fail
TC3: Fail
TC4: Pass

The second option provides additional information at no extra work, perhaps only additional
run time. The first option could lead to a situation where TC3 failure is only found
after fixing TC2 and where you don’t know if this problem is further regression by
fixing TC2 or was there before.

What would also help is to introduce a script command that allows the test engineer
to write data to the log in a way that can easily be parsed.

Reported by Loobinex on 2015-08-09 23:58:23

mefistotelis commented 9 years ago

By just having a a parameter '-runtest abort/continue' to start running all test maps,
and a '-frameskip #' parameter you can much more easily intergrate it in your automated
build process(run regression test on checkin!). Possibly update the '-level' parameter
to only run the testmap with that number to accomodate manual testruns by having batch
files (You'd have to copy the log file between each map to store the test results.)
This way the tester doesn't have to open the game manually and select a campaign. And
you would not need to implement the '-nointeractive' parameter, no setting to indicate
a campaign is a test-campaign and no functionality to skip the landview.

Reported by Loobinex on 2015-08-10 00:30:31

mefistotelis commented 9 years ago

I would prefer to use the "normal" game path whenever possible.

The "runtest" parameter of your description would be a completely separate code path
to normal game menus; it would have to be implemented from scratch (though I admit
it's not a huge code).

Levels in a campaign file can easily be changed to select the tests you want to run
- you only have to remove a level from SINGLE_LEVELS to disable it.

Also, you will be able to use multiple test campaigns referring to the same levels.

Reported by mefistotelis on 2015-08-10 07:07:00

mefistotelis commented 9 years ago

Regression testing KeeperFX with only the level scripts is limited, but to get somewhat
decent coverage you will still have to create a very large number of maps. Campaigns
are currently limited to 25 maps, so you'll have the overhead of having to make a lot
of campaign.cfg files.
And if you want to run the regression test you'd still have to sit there and every
25 maps click another campaign for testing.

The key to automated regression tests is - not surprisingly - automation. The design
you have right now has far to many manual steps which don't scale well. Secondly writing
the tests is very labour intensive and automated testing only becomes valuable if you
have some semblance of coverage. If you don't like my suggestions make something else,
but please allow (at least in the design) for full(!) automation and quick and easy
TC creation above all else.

Reported by Loobinex on 2015-08-10 07:52:14

mefistotelis commented 9 years ago

Actually, the campaign levels limit is now 50 (see https://code.google.com/p/keeperfx/source/browse/trunk/keeperfx/src/config_campaigns.h).
And this can be easily increased.
But we would do fine even with executing the levels separately (that actually has its
advantages too, ie. clean environment for every test and separate log file for each
test case).

It is easy to create a batch scripts which lists all levels in a folder and executes
KeeperFX separately for each level.

I agree testing with level script is limited, but this is the only thing that looks
achievable for me. In case of games, the whole-system tests are easier to maintain
than unit tests, and not many people are interested in writing unit tests for open
project. Not to mention KeeperFX is not fully rewritten, so it doesn't have a proper
separation of internal units.

My idea of measuring performance is still a bit vague. I want to add a command like:

SET_PERFORMANCE_FACTORS(<variable1>,<wage1>,<variable2>,<wage2>,<variable3>,<wage3>...)

Using given script variables and their wages, the game would be able to compute a single
value of performance. Then, for example, if a test is about money, this could allow
measuring if a code change made computer player dig more gold in the test map.

What do you think about this?

Reported by mefistotelis on 2015-08-10 11:35:55

mefistotelis commented 9 years ago

I agree with your assessment that doing automated test like this is the only feasable
approach. I've always had difficulty making my developers write unit tests and they
were getting paid for it. Through level scripts there are still quite a few things
that can be tested and no developers are needed.

In your example, how to you want to configure the variables, and what will be your
output? Is your plan to add some variables specifically for testing to the game script?

Perhaps the easiest way is to keep it simple and on win/lose game and/or player defeat
always automatically write the gameturn to the log file. Have the test-engineer write
the win/lose condition in such a way to account for proper performance. (e.g. lose
game on a gameturn, win game on an amount of gold).
If we run into some things we really want tested but cannot script, adding the script
variable that would enable it would probably be a fine addition for map makers as well.

How do you want to get the results back? Will you create a parser to look through the
log files for pass/fail conditions, or will you create a seperate log file which is
created when a 'runtest' parameter is run which only enters on every row something
like:
Map number - Map name - Pass/Fail - gameturn - error quantity - datetime

Reported by Loobinex on 2015-08-10 13:44:36

mefistotelis commented 9 years ago

How to configure variables - the variables will be standard script variables; the command:
SET_PERFORMANCE_FACTORS(<variable1>,<wage1>,<variable2>,<wage2>,<variable3>,<wage3>)
would log a value to keeperfx.log which is:
LEVEL_PERFORMANCE = variable1*wage1 + variable2*wage2 + variable3*wage3
The value could even be accessible as another variable within the script if it is needed.

New variables - yes, if there is any value missing, new variables will be implemented.
These new variables will work for all standard script commands, so for mapmakers as
well.

Results - these will be written to the standard log file, I'll just make a simple SED
script to extract the results into readable format, ie CSV-like.

Reported by mefistotelis on 2015-08-10 14:54:16

mefistotelis commented 9 years ago

I don't understand 'wage' in this context. If it is the value why do you have to express
it in the script. Could you give a concrete example of how in Keeperscript you'd write
the 'Total Doors' of Player1, the number of knights belonging to Player_good and the
total amount of heroes player_good controls when player1 is destroyed.

But I'd say lets just try it and see how it works. I think adding the gameturn on win/lose
game to the log file by default would be a big help because this would be the key performance
indicator on most tests.

Reported by Loobinex on 2015-08-10 17:57:22

mefistotelis commented 9 years ago

Ok, here's wage explanation:

Let's assume we have a level which tests how computer player gathers gold and builds
traps, ie. boulders. Victory condition is to have 20000 gold and 10 placed traps.

But besides pass/fail result, we want to know the measurable performance - how exactly
KeeperFX improved since 20 builds ago.

We could just sum TOTAL_GOLD_MINED and BOULDER - but that would mean producing another
BOULDER has very little impact on performance. If we want those two to be equally important,
we need to scale BOULDER with a wage, ie.
PERFORMANCE = TOTAL_GOLD_MINED * 1 + BOULDER * 2000

This way producing 10 traps gives the same performance impact as mining 20000 gold.

Reported by mefistotelis on 2015-08-10 18:48:03

mefistotelis commented 9 years ago

Ok, I understand and that should work. I have never heared the term 'wage' here, I'm
used to calling it 'weight'.

In any case, you'll get a performance number, which says 'bigger number is better'.
How I was thinking was record the game_turns it takes to achieve the victory condition
of e.g. 20 boulders+20k Gold. This way you also have a performance indicator: Faster
is better.

I think the result is basically the same, so it is up to you to estimate if the effort
in implementing the weighted performance variables is worth it.

Reported by Loobinex on 2015-08-10 19:52:50

mefistotelis commented 9 years ago

Hm, looks like I was using that term incorrectly, it should be weight.

Reported by mefistotelis on 2015-08-11 01:00:26

PieterVdc commented 2 years ago

https://github.com/SimLV/dktestrunner is a start to this issue, still will need maps etc

dkfans / keeperfx

Allow automated testing #652