Random input tests - Githubissues

plocket commented 3 years ago

This is an alternative to having the developer write out every scenario to cover all their code. It's not ideal, but since you can't abstract in cucumber, writing every single scenario can be a huge task. It is possible, in cucumber, to allow the user to pass in data structures like lists. We would just have to handle randomly selecting them.

Note: This is not a fault in cucumber - it's not meant to be used the way we're using it.

Also need to think whether the developer will need to copy/paste this 'scenario' for however many times they want the random tests to be run, or if we can run them repeatedly somehow. This might be better in its own issue. [Edit: This is probably doable now that we have the knowledge of setting, and resetting, our own custom timeouts.]

plocket commented 3 years ago

[The pain of having to repeat the same values for multiple tests might be] dealt with using a cucumber 'Background'.

[Edit: "Background" might be a good feature to add in general, but the internal feedback is that we want to be able to run randomized tests anyway.]

plocket commented 3 years ago

Notes from a standup discussion (08/06/21):

Failure of random tests that did not encounter a system error (instead having invalid user input, etc.) could be marked with a note of e.g. 'did not get through randomized test', but not marked as failed tests. It would let the dev know what happened without marking it seem like the interview was definitely broken.
Stretch goal: Allow dev to set ranges for randomization.
Has some really interesting ideas about constrained random tests, though library probably not useful: Hypothesis (https://hypothesis.readthedocs.io/en/latest/index.html).

plocket commented 3 years ago

Just FYI, some folks have big questions about the use of randomized tests and whether they're really an appropriate tool.

plocket commented 2 years ago

The main non-technical challenge I see here is giving enough useful output to the user. Will need a lot of feedback on that.

Current MVP ideas (not much research so far):

Go to a page and try to fill in between 0 and all fields that are, or become, visible based on the type of the field.
Store all the variables that were set and what they were set to. Figure out how to display this to the user. Would love more discussion on this
Try to press a button to continue, randomizing on pages that have multiple submit buttons.
If the user gets one of the types of "invalid answers" messages, try filling in all the fields again. For each page, try this for the number of times the developer has defined in their Step.
Only system errors are failures.

Getting around tricky situations: We already have interviews where there are some tricky inputs that these kinds of tests may not be able to overcome, especially at MVP. To make this useable, and therefore user-testable, from the start, we need ways to get through these situations. Ideas:

Give an option for the dev to give an id for a page to end on? So randomized input Steps can be combined with other type of Steps to get around tricky situations.
Can we let people use a story-table-type-format to specify fields that should definitely be set to specific values while letting all other fields be filled in randomly.

Questions:

How does a test know to end? For example, it could get caught in a da "infinite loop" where it gets stuck on the same page, or loop of pages, when pressing "submit". Does the dev give a max number of questions for the whole interview? A max amount of time? Does the dev give the id for all "end" type question blocks, like kick-out screens and download pages?
If a page gets user validation errors, do we know some useful types of inputs to iterate through?
Do we try to press buttons on the page that aren't in the continue button area?
How will this behave if it gets itself to a review screen?
How do we find bugs and test when the interactions are random? Write a lot of very small, isolated, interviews for specific situations?

As @rpigneri-vol named them, these random tests are our "spellcheck" option - they aren't a proper testing suite and shouldn't be treated as such, but they may be better than nothing.

plocket commented 2 years ago

New home of "faker": https://www.npmjs.com/package/community-faker

Question was raised:

[Because the system is randomized, not deterministic (?)], can we somehow create a seed for a test so that it's reproducible?

I haven't ever looked into how to do that. Is that possible? Do we need it? [Maybe an answer: I don't think we do need this. From what I'm understanding, this refers to the interview, not to the test, and the interview is deterministic. At least, all interviews that I've ever worked with have been deterministic.]

plocket commented 2 years ago

Output thoughts:

In the report, only list the "name" of the test (maybe just sequential numbers) and the order of the screens (page id and/or title?). Each test then has a ~file or~ folder (with a matching name) in the downloadable artifacts containing:

A file with:

At the top, either a file or just the text for the code that would reproduce the path so developers can easily add it to their test suite. I was hoping to make a story table, but if they have proxy vars, but no #trigger element, that may not be possible. Unless... could we change proxy var handling so that each individual var row as a limit for how many times it can be used? E.g. the trigger variable could contain a number instead of a variable name. We could think of using that as a technique in general. Should this be in its own .feature file? Should it be in both places? Should this be included in the report as well? That would make it easily accessible. It would be a lot of info, though. Should the report only include the reproducible test text? That sounds confusing to read as the only output. [If it's a story table, should it be in alphabetical order?]
The same output as the report? A list with just the order of screens? Should this come first? Should this be in its own file?
A chronological list of the page ids (and/or titles?). Under each one of those, list in chronological order:
- the variables that were set with the answers that were attempted and any "invalid value" message that appeared.
- Any page-wide "invalid value" message that appeared.
- Any attempt to continue.
- Rinse and repeat if the page didn't continue.
Infinite loop pages wouldn't show up 20 (or w/e) times. They'd just have "(infinite)" next to the page id/title.
The JSON var values on the final screen that was reached. Or JSON var values for each screen? That seems like a lot.

Other files in the folder:

Downloaded documents.
Screenshots of the failure page.
Screenshots of pages that got "invalid value" messages? [I mean not the failure ones, just the ones where the tests had to try multiple inputs. That seems confusing. How would we make it clear that these weren't failures, they're just info?]
Maybe JSON var values goes here as a separate file? If we're showing JSON var values for all screens, should each screen have its own file?

Maybe the name of the folder would also contain "failed" if it failed. Or maybe there'd be a "failed" folder for all failed tests? I don't love nested folders, though.

plocket commented 2 years ago

So, first brainstorm for what random input output might look like in the downloaded artifacts folder, including folders (folder 1 is open):

report.txt
  """
  Some ALKiln title with date, time, and ALKiln version

  ====================
  Failed tests
  ====================
  failed_random_input_tests_1 question ids and titles
    accept-terms: "Do you accept the terms?"
    name-question: "What is your name?"
    contact-info: "What is your address?"
    lawyer-name: "Do you have a lawyer?" (infinite loop)
    final target variable: We couldn't find this info in the page. See our-docs.com.

  --------------------

  failed_random_input_tests_2 question ids and titles
  <etc>
  """
> failed_random_input_tests_1
    <some doc name>.pdf
    failure_screenshot.png
    failed_random_input_tests_1_report.txt
      """
      Some ALKiln title with date, time, and ALKiln version

      final target variable: We couldn't find the target variable of the page. See our-docs.com.

      --- Test (copy into your own file in "Sources" folder. More instructions?) ---
      Feature: Replace with description

      Scenario: Replace with description
        Given I start the interview at "a-legal-form.yml"
        And I get to the question id "replace with your target question id" with this data:
          | var | value | trigger |
          | accept_terms | True |  |
          | user.name.first | Reina |  |
          | user.name.last | Gonzalez |  |
          | user.address.address | 342 Main St. |  |
          | user.address.city | Boise |  |
          | user.address.state | Idaho |  |
          | user.phone_number | 555-555-5555 |  |
          | has_lawyer | False |  |

      --- End of test ---

      failed_random_input_tests_1 question ids and titles
        accept-terms: "Do you accept the terms?"
        name-question: "What is your name?"
        contact-info: "What is your address?"
        lawyer-name: "Do you have a lawyer?" (infinite loop)

      failed_random_input_tests_1 question ids and titles
        accept-terms: "Do you accept the terms?"
          check the accept_terms checkbox
          Continued
        name-question: "What is your name?"
          user.name.first was set to "Reina"
          user.name.last was set to "Gonzalez"
          Continued
        contact-info: "What is your address?"
          user.address.address was set to "342 Main St."
          user.address.city was set to "Boise"
          user.address.state was set to "Idaho"
          user.phone_number was set to "Lorem ipsum"
          Tried to continue
            invalid answer for user.phone_number: "This answer needs to be a phone number"
          user.phone_number was set to "555-555-5555"
          Continued
        lawyer-name: "Do you have a lawyer?" (infinite loop)
          checked the has_lawyer checkbox
          Continued, but saw the same page
        target variable: We couldn't find the target variable of the page. See our-docs.com.

      JSON variable values on the final page:
      {
        ...
      }
      """
> failed_random_input_tests_2
> passed_random_input_tests_1

I'm using indentation to denote contents of the folder or file.

Not sure what to call an "infinite loop" question. I don't think anyone else uses the name "infinite loop" to describe questions where you press continue and you just keep getting the same question over and over again.

Infinite loops: we may only catch single-page infinite loops and probably not all of those either. Will try to write a whole comment about that later.

Edit: Maybe this:

          | has_lawyer | False |  |

...needs to be replaced with interactive Steps. And I set the variable "has_lawyer" to "False" and And I tap to continue and Then I got to the next page (or whatever that last one is). That way we'd be able to put the id in for the has_lawyer page and replicate the test completely.

nonprofittechy commented 2 years ago

Can we add to the report a list of the possibly hidden fields? Or just the fields that were on the screen and the values that they had or did not have?

nonprofittechy commented 2 years ago

Use small values for integers (0-10) so you don't test with 99 children. Maybe similar way to answer "no" after a couple screens where you are asked "is there another".

General: if I've seen this screen 5 times, pick a different button this time.

nonprofittechy commented 2 years ago

Deep dive discussion: creating the feature file as a separate file might be useful! A failed random test is a good candidate for a test you always run the same way.

plocket commented 2 years ago

where to put error screenshots for easy access? And can we give more useful info in filenames now? From #429 discussion about artifacts structure.

Maybe the error screenshots should be in both places. This is the idea proposed here, though the names are missing a bunch of pieces to avoid a mess and show the arrangement more clearly.

report.txt (whole report)

error-3pm-scenario 1 description-endingPgID.png

error-4pm-scenario 2 description-endingPgID.png

3pm-endingPgID-scenario 1 description (folder)

report-endingPgID.txt (scenario only)

download-3pm-file-1-pgID.pdf

error-3pm-pgID.png (Same pic as "error-3pm-scenario 1 description.png". Same name seems dumb because it would have the scenario name in it too)

json-3pm-pgID.json

json-3pm-pgID.json

scenario 1 description.feature

screenshot-3pm-pgID.png

4pm-endingPgID-scenario 2 description (folder)

report.txt (scenario only)

error-4pm-pgID.png (Same as scenario 1 error pic name rationale)

scenario 2 description.feature

"3pm" and "4pm" just indicates a timestamp. "PgID" could be question id or screen trigger var, whichever is available, or neither if none available. Not sure which is preferred.

Everything would be in one artifact folder.

plocket commented 2 years ago

A challenge we'll have with creating the story table output for users:

Every field name representing a variable needs to be base64 decoded. Currently that means we have multiple guesses for what a field name might be. That would add a lot of nonsense rows to the table which would be duplicates of each other.

Proposals to reduce this problem:

Detect non-valid var name characters (that are often present when some text has been overly decoded) and remove those from the table. We can probably also remove those from field name guesses as a bonus.

SuffolkLITLab / ALKiln

Random input tests #22