Research Report by zhan0850

Software Testing - Role, Principles, and Limitations

1.Introduction The following definition of software testing is cited from a conference material written by Cem Kaner, 2006, and is quoted by Wikipedia as a widely-accepted definition:

"Software testing is an investigation conducted to provide stakeholders with information about the quality of the product or service under test." [1]

The statement classifies testing as an investigation that shows stakeholders how good the quality of the software under development is. It gives the idea that testing is important to the product software. It does not contain information about how the investigation should be performed and to what extent developers can rely on testing.

As an expansion to the above, this report first analyses the role of testing in software development process, then introduces some rules of thumbs to apply general testing techniques, and finally discusses what testing is not and what cannot be achieved through testing.

2.The Role of Testing in Software Development The importance of testing in a software development process is necessary to be understood by every software engineer, but is also trivial. Testing does not only reduce number of bugs in codes, it also examines if the implemented functionalities meet the corresponding requirements.

2.1 Testing in Waterfall Model In a traditional waterfall model, testing is the one last step before the product system is delivered to customer. All assessments about meeting the requirements are carried out at this stage. However, with very limited time allowed for the entire project, it is very often that the time remaining for testing is squeezed heavily.

Moreover, In a fast changing environment, it is quite possible that by the time the system is completed, the original requirements may have already become old fashioned. In addition, in the case of a small project, more time could be spent on planning and documentation than on designing, implementing and testing.

Thus, comprehensive testing is rarely finished. The step which is equivalent to quality checking for the product system cannot receive enough time to proceed. [2]

2.2 Testing in Incremental Model In contrast with waterfall model, the incremental model has emerged in this rapidly changing environment. Methods of this model like Agile or XP usually use a test-driven development approach. In this approach, the whole project is broken down into several increments or iterations to be completed in shorter time boxes. In each iteration, tests are written before the actual implementation. An implementation is correct only if it can pass the existing test cases designed for it.

As a result, testing is interleaved with analysis, design, and implementation. So developers can spend more effective time on testing to have it carefully planned and carried out.

3.Principles of Testing After having a good understanding of the importance of testing, this section discusses how general testing methods and techniques, i.e. static and dynamic methods, box testing, unit testing, integration testing, acceptance testing, etc., are applied in normal software development projects.

3.1 Test a program to try to make it fail Dijkstra said that "testing shows the presence, not the absence of bugs [3]". The interpretation of this is obvious: except for trivial systems, those many test cases a developer can come up with are just a part of all possible scenarios. So testing is to execute the program with the intention to find bugs [4], not with the intention to show it is bug free. A good testing practice should show the program fails at some point, and after debugging, it can show the program successes at the same point.

3.2 Test early and regularly Not surprising at all, fixing bugs is more expensive if it spotted at a later stage, but it is surprising how tremendously more expensive it is. Figure 1 shows that, in a waterfall model, cost of finding and fixing bugs is the result of an exponential function of time. It is also the case in test-driven approach, where a bug in an early iteration can accumulate to become a bigger problem and propagate to other components.

Figure 1. Cost of Fixing Bugs in Warterfall Model [5]

Additionally, tests should be carried out on a regular basis since when new components are added in, they may have unexpected effects on existing ones. Also, as long as the requirements are changed, test cases should be updated accordingly.

3.3 Testing is context dependant The test methods to be used should depend on the context of the system to satisfy various requirements [6]. Test cases should be composed according to different components. Even with the same component in the same system, testing objectives can be different at different point of time. For example, unit testing checks if each component functions properly as individuals, while integration testing focuses on interaction between several components as a whole.

3.4 Define a Test Plan A test plan should address test scope, test strategy, test methods and techniques, test results interpretation, and how comprehensive the test cases are to cover the possible scenarios. Also importantly, it should include how the tests are going to reveal the customer requirements, and how requirements are measured [7][8][9]. Furthermore, a risk assessment of the system under development should be considered here. The essence of risk assessment can be illustrated from the negative results of software disasters such as the Denver Airport Baggage Handling System [10] and the Fukoshima Nuclear Disaster [11]. Missing or under evaluating a risk may result in too few or even no specific tests covering this risk at all, which will further result in hidden faults, which finally leads to system failure.

3.5 Test for valid as well as invalid inputs Test cases should of course include valid inputs, thus to mimic regular scenarios, which are the most likely real life situations. After those, developers should think about if things can go wrong with invalid inputs such as null pointers, out-of-bound index, etc. The reason is not only because statistically there are always a chance for exceptions to occur, but it is also quite often that a system is crashed by a single component which has got an unexpected input. Moreover, an interesting thing is, during testing, invalid input conditions seem to have higher error detection yield than test cases with valid input conditions [4].

4.Limitations of Testing 4.1 Testing cannot show bug-free Also according to that famous quote of Dijkstra: "Testing shows the presence, not the absence of bugs [3]." In reality, testing is based on comparing actual outputs to expected outputs of a component in response to some inputs. These inputs are just a subset of all possible scenarios. Since very likely there are a great many of possibilities, it is rarely practical for one to be able to carry out exhaust testing. Hence, a good test case may indicate the existence of a fault within some component, or after the fault is corrected, it may show that component does not produce a particular kind of error when it receives a particular kind of input, but it never gives the idea that this component does not produce any kind of bug in response to any kind of input.

4.2 Tests are no substitute for specifications With test-driven methods, developers write codes to pass the tests. Conventionally, the code becomes deliverable once it passes all tests. So to some extent, testing has substituted the requirement specification. However, tests, no matter how many test cases they include, are just instances of the requirements, which provide an abstraction of the expected outcome [12]. It is true that requirement specification is used for generating test cases, but inversely, using test cases to summarise a requirement is a conduct should be avoided. It is like a universal statement can be used to prove as many existential statements with the same predicate as desired, but no matter how many existential statements conjunction together, they cannot be used to prove a single universal statement with the same predicate.

4.3 Software testing does not help in finding root causes IEEE standard terminology states: "An unsatisfactory program execution is a 'failure', pointing to a 'fault' in the program, itself is the result of a 'mistake' in the programmer’s thinking."

According to section 3.1, testing is to execute the program to make it fail. It is the process to uncover failures in the program. So testing tells the developers the test subject does not generate satisfactory output in response to some certain inputs. However, it does not tell how the inputs are processed to result in such outputs. In other words, testing does not help in finding root causes, i.e. where, when, and how things go wrong, which lead the software to malfunction in the first place. That is done via another distinct activity known as debugging.

4.4 Extra Coding and Extra Chance to Have Bugs No matter how automated the testing procedure a project is using, it requires some minimal level of human coding in the first place. Obviously, coding for testing means extra work to do. From instantiating components and initialising test inputs, to inter-component communication and results interpretation， every piece of work is an additional programming task. Moreover, additional programming means additional chance to have bugs. For a failed test case, a developer has to check both the source code and test code for faults, because it may turn out to be that the logic or result interpretation is wrong within the test code.

5.Recommendation Let us examine the definition in section 1 given by Kaner again. It is true that passing tests can at least show the software does not produce faulty outputs in situations covered by existing test cases. Plus, if these test cases are carefully planned, then it is reasonable for the stakeholders to be confident in the software's correctness. So the connection between testing and quality does exist and is important to keep in mind.

However, passing tests is equivalent to quality assurance for the product system is usually not the case. Test cases are instances of real life situations. They are elements of a subset of all possible inputs. What testing can prove is merely the performance based on that subset. Using the property of a subset to draw a conclusion for the full set is absolutely invalid.

6.References [1] Kaner, C, 2006, Exploratory Testing, Key Note at QAI http://www.kaner.com/pdfs/ETatQAI.pdf

[2] Sommerville, I, 2011, Software Engineering, 9th Edition, Pearson, Boston

[3] Dijkstra J.N. Buxton and B. Randell, eds, Software Engineering Techniques, April 1970, p. 16. Report on a conference sponsored by the NATO Science Committee, Rome, Italy, 27–31 October 1969. http://homepages.cs.ncl.ac.uk/brian.randell/NATO/nato1969.PDF

[4] Myers, Glenford J., ―The art of software testing‖, New York: Wiley, c1979. ISBN: 0471043281

[5] Software Testing–Goals, Principles, and Limitations http://www.ijcaonline.org/volume6/number9/pxc3871448.pdf

[6] Shari Lawrence Pfleeger, "Software Engineering, Theory and Practice", Pearson Education, 2001.

[7] James Whittaker, The 10 Minute Test Plan, Google Testing Blog http://googletesting.blogspot.com.au/2011/09/10-minute-test-plan.html

[8] Test Plan, Wikipedia http://en.wikipedia.org/wiki/Test_plan

[9] Pratiksha, BR, Hinni, How to Write a Test Plan http://www.wikihow.com/Write-a-Test-Plan

[10] Denver Airport Baggage System Case Study http://calleam.com/WTPF/?page_id=2086

[11] Rob Gilhooly, 2012, Fukushima nuclear accident down to human factors, New Scientist http://www.newscientist.com/article/dn22031-fukushima-nuclear-accident-down-to-human-factors.html#.U7LxZbTNiQs

[12] Meyer, B, 2008, Seven Principles of Software Testing, Computer, Aug. 2008, Vol.41(8), pp.99-101

Hello,

I am putting in a recommendation to revise your grade to include this assignment, for a final result of 78 DN.

Paul.

On Thu, Jul 10, 2014 at 11:59 AM, Lian Zhang notifications@github.com wrote:

Software Testing - Role, Principles, and Limitations

Introduction The following definition of software testing is cited from a conference material written by Cem Kaner, 2006, and is quoted by Wikipedia as a widely-accepted definition:

"Software testing is an investigation conducted to provide stakeholders with information about the quality of the product or service under test." [1]

The statement classifies testing as an investigation that shows stakeholders how good the quality of the software under development is. It gives the idea that testing is important to the product software. It does not contain information about how the investigation should be performed and to what extent developers can rely on testing.

As an expansion to the above, this report first analyses the role of testing in software development process, then introduces some rules of thumbs to apply general testing techniques, and finally discusses what testing is not and what cannot be achieved through testing.

The Role of Testing in Software Development The importance of testing in a software development process is necessary to be understood by every software engineer, but is also trivial. Testing does not only reduce number of bugs in codes, it also examines if the implemented functionalities meet the corresponding requirements.

2.1 Testing in Waterfall Model In a traditional waterfall model, testing is the one last step before the product system is delivered to customer. All assessments about meeting the requirements are carried out at this stage. However, with very limited time allowed for the entire project, it is very often that the time remaining for testing is squeezed heavily.

Moreover, In a fast changing environment, it is quite possible that by the time the system is completed, the original requirements may have already become old fashioned. In addition, in the case of a small project, more time could be spent on planning and documentation than on designing, implementing and testing.

Thus, comprehensive testing is rarely finished. The step which is equivalent to quality checking for the product system cannot receive enough time to proceed. [2]

2.2 Testing in Incremental Model In contrast with waterfall model, the incremental model has emerged in this rapidly changing environment. Methods of this model like Agile or XP usually use a test-driven development approach. In this approach, the whole project is broken down into several increments or iterations to be completed in shorter time boxes. In each iteration, tests are written before the actual implementation. An implementation is correct only if it can pass the existing test cases designed for it.

As a result, testing is interleaved with analysis, design, and implementation. So developers can spend more effective time on testing to have it carefully planned and carried out.

Principles of Testing After having a good understanding of the importance of testing, this section discusses how general testing methods and techniques, i.e. static and dynamic methods, box testing, unit testing, integration testing, acceptance testing, etc., are applied in normal software development projects.

3.1 Test a program to try to make it fail Dijkstra said that "testing shows the presence, not the absence of bugs [3]". The interpretation of this is obvious: except for trivial systems, those many test cases a developer can come up with are just a part of all possible scenarios. So testing is to execute the program with the intention to find bugs [4], not with the intention to show it is bug free. A good testing practice should show the program fails at some point, and after debugging, it can show the program successes at the same point.

3.2 Test early and regularly Not surprising at all, fixing bugs is more expensive if it spotted at a later stage, but it is surprising how tremendously more expensive it is. Figure 1 shows that, in a waterfall model, cost of finding and fixing bugs is the result of an exponential function of time. It is also the case in test-driven approach, where a bug in an early iteration can accumulate to become a bigger problem and propagate to other components.

Figure 1. Cost of Fixing Bugs in Warterfall Model [5]

Additionally, tests should be carried out on a regular basis since when new components are added in, they may have unexpected effects on existing ones. Also, as long as the requirements are changed, test cases should be updated accordingly.

3.3 Testing is context dependant The test methods to be used should depend on the context of the system to satisfy various requirements [6]. Test cases should be composed according to different components. Even with the same component in the same system, testing objectives can be different at different point of time. For example, unit testing checks if each component functions properly as individuals, while integration testing focuses on interaction between several components as a whole.

3.4 Define a Test Plan A test plan should address test scope, test strategy, test methods and techniques, test results interpretation, and how comprehensive the test cases are to cover the possible scenarios. Also importantly, it should include how the tests are going to reveal the customer requirements, and how requirements are measured [7][8][9]. Furthermore, a risk assessment of the system under development should be considered here. The essence of risk assessment can be illustrated from the negative results of software disasters such as the Denver Airport Baggage Handling System [10] and the Fukoshima Nuclear Disaster [11]. Missing or under evaluating a risk may result in too few or even no specific tests covering this risk at all, which will further result in hidden faults, which finally leads to system failure.

3.5 Test for valid as well as invalid inputs Test cases should of course include valid inputs, thus to mimic regular scenarios, which are the most likely real life situations. After those, developers should think about if things can go wrong with invalid inputs such as null pointers, out-of-bound index, etc. The reason is not only because statistically there are always a chance for exceptions to occur, but it is also quite often that a system is crashed by a single component which has got an unexpected input. Moreover, an interesting thing is, during testing, invalid input conditions seem to have higher error detection yield than test cases with valid input conditions [4].

Limitations of Testing 4.1 Testing cannot show bug-free Also according to that famous quote of Dijkstra: "Testing shows the presence, not the absence of bugs [3]." In reality, testing is based on comparing actual outputs to expected outputs of a component in response to some inputs. These inputs are just a subset of all possible scenarios. Since very likely there are a great many of possibilities, it is rarely practical for one to be able to carry out exhaust testing. Hence, a good test case may indicate the existence of a fault within some component, or after the fault is corrected, it may show that component does not produce a particular kind of error when it receives a particular kind of input, but it never gives the idea that this component does not produce any kind of bug in response to any kind of input.

4.2 Tests are no substitute for specifications With test-driven methods, developers write codes to pass the tests. Conventionally, the code becomes deliverable once it passes all tests. So to some extent, testing has substituted the requirement specification. However, tests, no matter how many test cases they include, are just instances of the requirements, which provide an abstraction of the expected outcome [12]. It is true that requirement specification is used for generating test cases, but inversely, using test cases to summarise a requirement is a conduct should be avoided. It is like a universal statement can be used to prove as many existential statements with the same predicate as desired, but no matter how many existential statements conjunction together, they cannot be used to prove a single universal statement with the same predicate.

4.3 Software testing does not help in finding root causes IEEE standard terminology states: "An unsatisfactory program execution is a 'failure', pointing to a 'fault' in the program, itself is the result of a 'mistake' in the programmer’s thinking."

According to section 3.1, testing is to execute the program to make it fail. It is the process to uncover failures in the program. So testing tells the developers the test subject does not generate satisfactory output in response to some certain inputs. However, it does not tell how the inputs are processed to result in such outputs. In other words, testing does not help in finding root causes, i.e. where, when, and how things go wrong, which lead the software to malfunction in the first place. That is done via another distinct activity known as debugging.

4.4 Extra Coding and Extra Chance to Have Bugs No matter how automated the testing procedure a project is using, it requires some minimal level of human coding in the first place. Obviously, coding for testing means extra work to do. From instantiating components and initialising test inputs, to inter-component communication and results interpretation， every piece of work is an additional programming task. Moreover, additional programming means additional chance to have bugs. For a failed test case, a developer has to check both the source code and test code for faults, because it may turn out to be that the logic or result interpretation is wrong within the test code.

Recommendation Let us examine the definition in section 1 given by Kaner again. It is true that passing tests can at least show the software does not produce faulty outputs in situations covered by existing test cases. Plus, if these test cases are carefully planned, then it is reasonable for the stakeholders to be confident in the software's correctness. So the connection between testing and quality does exist and is important to keep in mind.

However, passing tests is equivalent to quality assurance for the product system is usually not the case. Test cases are instances of real life situations. They are elements of a subset of all possible inputs. What testing can prove is merely the performance based on that subset. Using the property of a subset to draw a conclusion for the full set is absolutely invalid.

References [1] Kaner, C, 2006, Exploratory Testing, Key Note at QAI http://www.kaner.com/pdfs/ETatQAI.pdf

[2] Sommerville, I, 2011, Software Engineering, 9th Edition, Pearson, Boston

[3] Dijkstra J.N. Buxton and B. Randell, eds, Software Engineering Techniques, April 1970, p. 16. Report on a conference sponsored by the NATO Science Committee, Rome, Italy, 27–31 October 1969. http://homepages.cs.ncl.ac.uk/brian.randell/NATO/nato1969.PDF

[4] Myers, Glenford J., ―The art of software testing‖, New York: Wiley, c1979. ISBN: 0471043281

[5] Software Testing–Goals, Principles, and Limitations http://www.ijcaonline.org/volume6/number9/pxc3871448.pdf

[6] Shari Lawrence Pfleeger, "Software Engineering, Theory and Practice", Pearson Education, 2001.

[7] James Whittaker, The 10 Minute Test Plan, Google Testing Blog http://googletesting.blogspot.com.au/2011/09/10-minute-test-plan.html

[8] Test Plan, Wikipedia http://en.wikipedia.org/wiki/Test_plan

[9] Pratiksha, BR, Hinni, How to Write a Test Plan http://www.wikihow.com/Write-a-Test-Plan

[10] Denver Airport Baggage System Case Study http://calleam.com/WTPF/?page_id=2086

[11] Rob Gilhooly, 2012, Fukushima nuclear accident down to human factors, New Scientist

http://www.newscientist.com/article/dn22031-fukushima-nuclear-accident-down-to-human-factors.html#.U7LxZbTNiQs

[12] Meyer, B, 2008, Seven Principles of Software Testing, Computer, Aug. 2008, Vol.41(8), pp.99-101

— Reply to this email directly or view it on GitHub https://github.com/gardners/2014SE3/issues/251.

gardners / 2014SE3

Research Report by zhan0850 #251