Open Ryanf55 opened 10 months ago
Feedback from the production working group
Next steps:
I like to discuss this issue/proposal more generally under the title of declarative and executable test specifications. Gherkin syntax and tools like Cucumber/Behave/SpecFlow are niche examples for that but are mainly developed mostly for web/mobile applications. Robotic applications require more complex specification languages.
Consider an abstract example in Gherkin syntax
GIVEN g1
GIVEN g2
WHEN w1
THEN t1
This scenario essentially express the Boolean logic formula g1 && g2 && w1 -> t1
where g1
, g2
, w1
, and t1
are predicates (boolean-valued functions). Nice and more readable but hardly a significant technical advantage over standard unit testing. These predicates still need to be defined in code, called step executors. The tool should execute these steps and check the example. There are a few points when bringing these concepts into robotics.
First, the concept of time. Originally and widely used in web applications, the step execution assumes a very sparse and often event-based interaction with the environment. However, many robotic applications require much denser interaction, especially at lower levels. This means that all Gherkin predicates, such as GIVEN ServerNode is running
, should ideally be checked at every single point over the timeline. Existing BDD executors are not designed for such temporal use cases. At least, I am not aware of any mature implementation doing that. So the checkers must be aware of the temporal nature and must be able to operate on data streams. For ROS, this may mean that the executor should be implemented as an observer/monitor/checker ROS node from the specification. We study these applications under the field of (specification-based) runtime verification. Gherkin-like specifications can be useful in principle but should be at least enriched with temporal/timing constraints. The example in the original post already contains a within <time-interval>
keyword. When the interaction is dense, timers to check such constraints become too complicated and inefficient.
Second, a natural language may be too ambiguous to specify complex real-time properties. It works adequately for web/mobile apps, but it is harder to explain robotic system requirements in unrestricted natural languages. Therefore, safety standards like ISO26262 usually suggest the use of structured English, semi-formal, and/or formal specifications (thus logical formulas). I like Gherkin's structured way, and we should ensure that any specification can be translated into an equivalent logical formula. These formulas must be defined over atomic predicates (for ROS, topic names are atomic functions). So I think GIVEN /turtle/server_node/ is "RUNNING"
may be better and more automatizable compared to proper English grammar.
Third, example-based tests (input-output test pairs) are never enough for complex real-time systems. Normally, BDD or Gherkin do not dictate example-based tests, but that's the most common case in practice. A more beneficial approach for complex systems is specification-based tests (defining rules between input and output). This approach is also called property-based testing and is implemented in some tools like QuickCheck/Rapidcheck/Hypothesis, but these tools also lack the temporal aspect, as explained previously.
For the next steps, I first would like to collect behavioral specifications from real industrial studies. These specifications can be in plain English. Companies are sometimes reluctant to share their system requirements for various reasons. Still, building a repository of example robotic requirements would be great, especially at the system level.
For the next steps, I first would like to collect behavioral specifications from real industrial studies. These specifications can be in plain English.
Great ideas here on an alternative approach. Were you willing and able to collect some of these industrial studies and share them here?
For reference, we are currently using python behave
for a couple of hundred test steps and it's working ok.
I always wished to collect more actual system requirements from the industry as collecting diverse requirements is helpful for tool developers. At the basic level, these requirements should look like such sentences:
/namespace/topic_name
is greater than 0
, then /namespace/another_topic_name
will be greater than 5
within 10
milliseconds./namespace/topic_name
equals to ENUM_VALUE
and /namespace/another_topic_name
is greater than 20
, then /namespace/topic3
is always true.Also I wonder how you would compare a property-based testing tool like Hypothesis with your Behave use cases?
ROS 2 Production Task Proposal
Proposal Description:
This proposal is to support testing ROS nodes with behavior-driven testing following behavior-driven development practices.
Essentially, starting from system and integration requirements, you can write gherkin-syntax tests for how a ROS node is supposed to behave. These can be significantly easier to write and develop than current methods, and are much easier for systems engineers to understand that don't want to get lost in the syntax of pytest or gtest.
A good background on this is here.
The scope of the proposal includes:
colcon test
to drive the BDD runnerCreating or add to an example repository:
Example Gherkin Syntax
For the add_two_ints_server.cpp that adds two integers with the AddTwoInts.srv inteface.
{a: 2, b: 3}
on theadd_two_ints
topic is requested{sum: 5}
within timeout1
second(s)Example colcon syntax
colcon test --packages-select demo_nodes_cpp
Estimated Effort:
Area of Impact:
Testing, Docs, CI/CD
Related Works