IBMStreams / administration

Umbrella project for the IBMStreams organization. This project will be used for the management of the individual projects within the IBMStreams organization.
Other
19 stars 10 forks source link

Proposal: Add script controlled Toolkit Test framework to streamsx.utility #122

Closed joergboe closed 6 years ago

joergboe commented 6 years ago

I would like to add script based test utility, that allows the parallel execution of spl-test cases and samples without an indirection over java or python topology frameworks. It supports the easy creation of variants of tests and uses the streams command line tools. The utility is lightweight and it enables the re-use of much spl-test application which are created during toolkit-development process. A test complex is composed of test suites and test cases. A set of command based tools is available which allows the automatic preparation of the required streams environment. A flexible set of properties allows the adaptation of the test to different environments. The provided logging and online help allows a fast interactive test case development process. A meanigfull summary is printed after test case execution.

markheger commented 6 years ago

+1

ddebrunner commented 6 years ago

Can you provide more details of how a test is written, how the correct behaviour is verified.

Does it work with standalone, distributed, Streaming Analytics service?

One of the advantages of the existing testing mechanisms is that they allow use of existing test frameworks (e.g. Java-JUnit, Python-unittest). Adding another tool that folks need to learn may not be the best approach.

joergboe commented 6 years ago

Tests are written as short shell scripts. Result codes are examinated to check behaviour. Evaluation functions are available to check content of files or console out against pattern matches. Test scripts are executed in an protected environment. Script error enforce test abort / failure In general every test is possible that can be implemented in a shell script. Helper functions to support standalone and distributed test are available.

cancilla commented 6 years ago

Can you share an example of what a test would look like?

ddebrunner commented 6 years ago

So the way the contents of a stream is check is by writing it out to a file using FileSink and them basically using diff or regexs on the file?

joergboe commented 6 years ago

I can provide an example: This file

Distributed and standalone Run test for FTP

--variantList:=distributed standalone

--variantList:=distributed

--TTRO_casePrep:=copyOnly myCompile

--TTRO_caseStep:=mySubmit myWaitForCompletion myCancelJob myEvaluate

function myCompile { echoAndExecute ${TTRO_splc} "$TTP_splcFlags" -M Main -t "$TTPN_streamsInetToolkit" --data-directory data -j "$TTRO_treads" }

function mySubmit { if [[ $TTRO_caseVariant == "standalone" ]]; then if echoAndExecute output/bin/standalone; then return 0 else return $errTestFail fi else if submitJob "output/Main.sab" "jobno.log"; then return 0 else return $errTestFail fi fi }

function myWaitForCompletion { until [[ -e data/BINDATA_FINAL ]]; do echo "wait for completion of job $jobno" sleep 1 done echo "Job $jobno completed" }

function myCancelJob { if [[ $TTRO_caseVariant == "distributed" ]]; then cancelJob $jobno fi }

function myEvaluate { local tmp=$(wc -c data/BINDATA_.txt | cut -d " " -f1) echo "Result has $tmp bytes" if [[ $tmp -gt 2000000 ]]; then return 0 else return $errTestFail fi }

is a test which runs in two variants:

--variantList:=distributed standalone

it has 2 test prepartion steps:

--TTRO_casePrep:=copyOnly myCompile

-copy only is function which copies the data into the workdir without modification -myCompile is the function that compiles the application And it has 4 test execution steps:

--TTRO_caseStep:=mySubmit myWaitForCompletion myCancelJob myEvaluate

every step is a function that is called from the test framework

The test suite file is:

--TTRO_suitePrep:=cleanUpInstAndDomainAtStart mkDomain startDomain mkInst startInst

--TTRO_suiteFin:=cleanUpInstAndDomainAtStop

Which take case that Domain and instance is running after test suite preparation and during finalization Domain and Instance are stopped

ddebrunner commented 6 years ago

What's the purpose of the test (i.e. what operator is it testing) and what's the matching SPL?

joergboe commented 6 years ago

It is a test for the FTPReader Operator of the inet toolkit

brandtol commented 6 years ago

+1

I think this framework is not meant to replace the various existing frameworks, but to provide a simple alternative without external dependencies (apart from shell version/compatibility issues).

Maybe this should go into a separate repository ?

ddebrunner commented 6 years ago

@joergboe Could you point to the specific SPL being used, Main is an ambiguous composite name.

I'd be interested to see what the test would look like using Python unittest.

ddebrunner commented 6 years ago

@brandtol I'm trying to understand if it is a simple alternative, given that it's unique to SPL there will be little communal knowledge on it (compared to existing standard frameworks), therefore it could be a steep learning curve for folks.

I also find it a little strange that the test contains so much boilerplate stuff, e.g. the compilation, the submission, cancelation etc., but maybe that could be improved.

chanskw commented 6 years ago

I am unsure if using a shell script for test is the direction we would like to point our customers into. With a more well known test framework like Python unittest or JUnit, like Dan said, the learning curve is smaller. Furthermore, these well known frameworks are usually easier to be integrated with build tools, making it easier for people to automate test and collect test results. I am concerned that using shell scripts may not be the best practises that we should be promoting for testing Streams application.

Having said that, I think we should have a repository focused on providing utilities for test and also to document and show how people should test Streams application. This question keeps coming up and it will be helpful if the answers and test samples can be published somewhere and is more discoverable. i.e. what are the best practises and strategies for testing Streams applications in general? What are the repeatable patterns? Otherwise, we will keep re-inventing the wheels and coming up with new scripts and tools to test toolkits and Streams applications.

ddebrunner commented 6 years ago

@chanskw See #91

Maybe we should create streamsx.testing with some initial content in mind.

chanskw commented 6 years ago

Yes, I think we should create the repository with some initial content. Even if the initial content is not totally complete as you have documented in #91, it would be a good starting point for us to define what these best practises are.

joergboe commented 6 years ago

@ddebrunner The boilerplate stuff, which is common to a number of cases should be moved into a common utility script. The sample, that I posted is one of the first complete test sample and the common steps were not available at that time. This will be improved.

joergboe commented 6 years ago

Main.spl.gz

brandtol commented 6 years ago

@ddebrunner I think the learning curve is not so much related to the test framework itself, but more to the language used and the domain specific extensions, in our case the SPL related tasks (how to compile/submit/cancel/etc...), and the available validator methods. Regarding language, I think shell scripting can be handled by everyone.

I spend a day to port an existing testcase to the shell framework and found it to be fairly simple, although a lot of common tasks are still missing (too much boilerplate, as you said).

Below is a quick and dirty comparison of how to do things in JUnit and in the shell test framework.

Task JUnit Shell framework
Test implementation Java Bash script functions
Declare testsuite Create class with test methods Add TestSuite.sh to a folder (script can be empty)
Declare testcase Annotate method with @Test Add TestCase.sh to a subfolder of a suite folder
Signal outcome of a test use org.junit.Assert methods use return codes of bash functions (0 is success)
Define multiple teststeps for a test call other methods in test method Use special comment in sripts #--TTRO_caseStep:=compile submit waitForDoneFile validate
Define prepare/cleanup actions for tests Use @Before and @After Use special comments in the scripts: #--TTRO_casePrep:=copyOnly startRedis and #--TTRO_caseFin:=canceljob stopRedis
Use testcase variants annotate with @Parameters Add special comments in the scripts: #--variantList:=success failure error scripterror
Reuse existing code use any available Java classes use bash methods which are defined either in the framework, the suite level, or the test case level scripts
Run testcases/suite implement TestRunner, or use existing runner use provided runTTF shell script
Report formats provides plain text and XML formats as far as I know probably needs adaption as the current format is free-form plain text
brandtol commented 6 years ago

@chanskw I think we cannot force customers to use one "blessed" framework anyway (or can we?). So I think we may well advertise multiple options. For example, I can imagine the shell framework to be very easy to use when it comes to testing toolkit samples.

Regarding easier integration with build tools, I disagree. With other (more sophisticated) testers, you need to ensure to have certain jars, python versions and packages available in your environment, whereas the shell tester can be delivered self-contained within a single bin directory, and should work on all systems providing a bash.

+1 for the idea to have a repository for this, it might even contain multiple testers, tuturials, etc. I would prefer to have the product junit testframework in this repository, but to bring that in the open would require high effort.

joergboe commented 6 years ago

I think the main effort in the learning process for the test environments is the understanding how the test components work and the results are presented, how the logs are to understand. The test frameworks are almost strait forward. +1 for the idea to have a repository for this

ddebrunner commented 6 years ago

@brandtol - Tools like travis and jenkins already understand how to handle junit tests and integrate the results into their reporting, I think that's the type of integration @chanskw was talking about.

All such systems will have no integration with this.

ddebrunner commented 6 years ago

I think there is a significant effort required to build out a complete test framework, including:

My concern is that existing harness already have solved these problems, and there exists a wealth of knowledge, i.e. I can search and find a solution. Is it really a good use of time to try to build a new test framework?

ddebrunner commented 6 years ago

Here's an example of a test using Python unittest for the FTPReader operator.

It tests that FTPReader scans at least 18 files from the FTP site and that if it's a file each file name ends in .zip which matches the current contents of that site.

It runs successfully with standalone and Streaming Analytics service. For distributed and Streaming Analytics service the tester also automatically verifies that no PEs were restarted during the test, to catch errors where the correct result might be returned but errors caused automatic PE restarts.

import unittest

from streamsx.topology.topology import *
from streamsx.topology.tester import Tester
import streamsx.spl.op as op
import streamsx.spl.toolkit as tk

class TestFTP(unittest.TestCase):
    """ Test invocations of FTP operators with standalone """
    def setUp(self):
        Tester.setup_standalone(self)

    def _add_toolkits(self, topo):
        tk.add_toolkit(topo, '../tk')
        tk.add_toolkit(topo, '/home/streamsadmin/toolkits/com.ibm.streamsx.inet')
    def test_scan(self):
        """ Test scanning of a standard site.  """
        topo = Topology()
        self._add_toolkits(topo)
        scan = op.Source(topo, "ftptest::FTPScanTest", 'ftptest::FileName_t')

        tester = Tester(topo)
        tester.tuple_count(scan.stream, 18, exact=False)
        tester.tuple_check(scan.stream, lambda  fn : fn['fileName'].endswith('.zip') if fn['isFile'] else True)

        tester.test(self.test_ctxtype, self.test_config)

class TestFTPCloud(TestFTP):
    """ Test invocations of FTP operators with streaming analytics """
    def setUp(self):
        Tester.setup_streaming_analytics(self, force_remote_build=True)

It's not the same as the script example above, but is based upon it. It's not the same because the above test (I think) downloads a 1MB file, but then checks the output (which seems to be modified) is at least 2,000,000 bytes, so it seems like it's not explicitly checking the functionality of the FTPReader operator. It wouldn't be too hard to add an additional test case to this Python test to ensure that the data read from the file was the actual expected size. (I'm also using an older version of the inet toolkit)

ddebrunner commented 6 years ago

Matching SPL file, a modified subset of the original SPL file, focusing on the FTPReader operator.

namespace ftptest;

use com.ibm.streamsx.inet.ftp::*;

type FileName_t = rstring fileName, uint64 size, rstring date, rstring user, boolean isFile, uint32 transferCount, uint32 failureCount, uint64 bytesTransferred, float64 speed;

type Error_t = rstring errorText, int32 error, uint32 transferCount, uint32 failureCount, uint64 bytesTransferred;

public composite FTPScanTest(output FilenameStream) {
    param
        expression<Protocol> $protocol :      (Protocol)getSubmissionTimeValue("protocol", "ftp");
        expression<rstring> $host :           getSubmissionTimeValue("host", "speedtest.tele2.net");
        expression<rstring> $path :           getSubmissionTimeValue("path", "/");
        expression<rstring> $username :       getSubmissionTimeValue("username", "anonymous");
        expression<rstring> $password :       getSubmissionTimeValue("password", "anon@localhost");
        expression<rstring> $fileToTransfer : getSubmissionTimeValue("fileToTransfer", "1MB.zip");
        expression<boolean> $verbosity :      (boolean)getSubmissionTimeValue("verbosity", "false");

    graph
        stream<int32 a> TriggerStream = Beacon() {
        param
           iterations: 1u;
        }

        //scan the remote directory
        (
            stream<FileName_t> FilenameStream as OUT
        ) = FTPReader(TriggerStream) {
            param
                protocol : $protocol;
                isDirReader : true;
                host : $host;
                path : $path;
                username : $username;
                password : $password;
                useEPSV : false;
                curlVerbose : $verbosity;
            output
                OUT :
                    fileName = FileName(),
                    size = FileSize(),
                    date = FileDate(),
                    user = FileUser(),
                    isFile = IsFile(),
                    transferCount = 0u, // TransferCount(),
                    failureCount = 0u, // TransferFailureCount(),
                    bytesTransferred = NoBytesTransferred(),
                    speed = TransferSpeed();
        }
}
ddebrunner commented 6 years ago

It would be run with:

# All tests
python3 -m unittest test_ftp.py

# A single test
python3 -m unittest test_ftp.TestFTPCloud.test_scan
brandtol commented 6 years ago

@ddebrunner Thats a good point about using tools like travis/jenkins for automated testing.

I was more concerned about the occasional contributor, who forks a toolkit repo, makes some fixes/enhancements and want to run a test suite to ensure nothing broke. In that scenario, they probably would not use tools like jenkins/etc and it should be as easy as possible for them to run an existing suite.

ddebrunner commented 6 years ago

@brandtol I imagine there are thousands of open source projects using JUnit or Python unittest that cope with having occasional contributors run their tests. Using a standard framework would seem to be a benefit in that case.

brandtol commented 6 years ago

@ddebrunner

I think there are two different issues here :

1) Integration with existing/planned CI tools like jenkins/travis/whatever The external contributor does not care, because he is not using these tools anyway. Handling that integration is the burden of the toolkit team, anyway.

2) The contributor wants to run the existing suite(s) and/or add new testcases That should be as simple as possible. Using JUnit or python-unittest would be the natural choice for Java or Python projects, of course! For testing SPL applications/operators it is not that simple, IMHO. You need to learn the extensions in junit product tester and/or topology tester, to implement a testcase. Compared to that effort the benefit of using a standard testing framework may be negligible. That said, I only spend two days with the topology tester, so this impression might be wrong.

Instead of focusing on a particular tester, we might try to define (or at least propose) an interface for the public toolkits to adhere, that allows integration with external tools and is easy to use for contributors.

I think we should continue that discussion on issue #91 and close this one.

ddebrunner commented 6 years ago

@brandtol Note it's not just the github/product toolkits @chanskw & I are thinking about, but customers as well who would like test their apps/toolkits and use CI tools like Jenkins.

A single solution for testing of IBM and customer's toolkits,, apps etc. should be the goal (including support for Streaming Analytics service).

Having the IBM team use the same tool we would want (encourage) our customers to use would be the best outcome.

chanskw commented 6 years ago

I agree with @ddebrunner I think we should lead by example, and show customers how to test their applications "properly". I do not think it is a good practise to invent a custom scripting framework for testing toolkits or Streams application. As Dan has said, creating and maintaining this type of framework is a lot of work and expensive, and I do not think this is the direction we want to head into.

As for supporting the "occasional" contributors, how often do we get one of these conributions? Furthermore, do you envision that the contributors will run the tests using the scripting framework, while our product / toolkit formal tests will use Junit / python tests? How do we maintain two sets of tests. I think we should have one set of tests that everybody runs. If we want to make it easier for people, then we maybe better off automating testing for people, perhaps integration with Jenkins or Travis when people check in code. Also, JUnit and Python tests are so common these days, I think most contributors will be able to pick it up and run the tests quickly.

joergboe commented 6 years ago

Again: The main effort during test case implementation and maintenance is to understand how the special test environment works:

The script tool supports -test reports,

brandtol commented 6 years ago

@chanskw Some replies on your concerns.

As for supporting the "occasional" contributors, how often do we get one of these conributions?

Probably my usage of this term was misleading. I am thinking about the IBMer who forks a public toolkit now and than, makes some enhancements/fixes needed for a certain project, and wants to run some regression before making a pull request (or wants to add 1-2 testcases for the changes). This happens quite often. For them, running some existing tests in their VM should be as easy as possible (without Jenkins/Travis).

Furthermore, do you envision that the contributors will run the tests using the scripting framework, while our product / toolkit formal tests will use Junit / python tests? How do we maintain two sets of tests. I think we should have one set of tests that everybody runs.

I fully agree having one set of tests for everyone would be the best solution. But as it stands now, we already have to maintain at least two sets of tests. Formal product tests are mostly implemented with the internal Junit tester (especially for non-public toolkits), these tests will likely not be ported to a different tester, due to high effort. Contributors cannot run these tests until we bring the product tester in the open, and move the tests from GHE to public github.

The product tests are included in the automatic regression procedure (which as far as I know will be triggered by Jenkins jobs). Tests located in the public toolkits tests folders are not automatically included in this regression so far. I think the product regression should use both, the internal tests AND additional tests picked up from the public toolkits tests folder (implemented using whatever framework).

If we want to make it easier for people, then we maybe better off automating testing for people, perhaps integration with Jenkins or Travis when people check in code.

One problem with this is, that you have to accept/merge a pull request first, before automated build/test jobs will be kicked off. Than, if something goes wrong, the toolkit team has to notify the contributor that his change broke something (analysis needs to be done by toolkit team, as the contributor cannot even see, or access the Jenkins logs). I would prefer to have a minimal test harnish to run for the contributor BEFORE making a pull request.

Also, JUnit and Python tests are so common these days, I think most contributors will be able to pick it up and run the tests quickly.

Ideally the contributor when doing regression, does not even care about how the existing tests are implemented, they just want to do something like :

cd tests
ant test

and check that no errors will come up. Of course that does not hold true when they want to add testcases.

chanskw commented 6 years ago

I created streamsx.testing for general discussions about how tests should be done and what tools we should develop. @ddebrunner @joergboe and myself are initial committers to the projects. Please continue the testing discussions to streamsx.testing. Thanks!