TBD54566975 / tbdex

56 stars 25 forks source link

[Proposal] Host test vectors on GH #214

Closed diehuxx closed 8 months ago

diehuxx commented 9 months ago

Overview

TBDex will host test vectors on github and can be fetched and cached by implementation repos in a .gitignore'd place. Implementation repos will NOT need to git commit the vectors themselves; implementation repos will only commit a git ref of the tbdex repo containing the desired version of the test vectors.

Hosting on Github

The github tbdex repo will host all test vectors for the protocol in the directory test-vectors/protocol/ . An implementation repo can fetch the test vectors at a given git ref like so:

curl -L \                                                                                                                                                                                                                                                                                                    
  -H "Accept: application/vnd.github+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
"https://api.github.com/repos/TBD54566975/tbdex/contents/specs/?ref=$MY_GIT_REF"

Each vector file will follow this JSON schema:

  1. description : A human readable description of the scenario being tested.
  2. input : A stringified JSON TBDex object
  3. output : If the input contains a valid TBDex object: A JSON object of the the expected parsed input.
  4. error : boolean. True if an error is expected, false if not. No error message/detail is provided in the test vectors because different languages/implementations may handle errors quite differently. So, it's better not to be too prescriptive of implementation.

NOTE: This is nearly identical to the current test vector structure. The only difference is standardizing the type of error.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "description": {
      "type": "string",
    },
    "input": {
      "type": "string",
    },
    "output": {
      "type": "object",
      "additionalProperties": true,
    },
    "error": {
      "type": "boolean"
    }
  },
  "required": ["description", "input"],
  "oneOf": [
    {
      "required": ["output"],
      "not": {"required": ["error"]}
    },
    {
      "required": ["error"],
      "not": {"required": ["output"]}
    }
  ]
}

Benefits of Hosting on Github

Some reviewers will prefer the approach to hosting test vectors taken by Web5 — that is, copy-pasting each test vector file into an implementation repo. I really really don’t want to do that. Here’s my hot takes:

  1. Hosting on Github is convenient. If an implementer wants to test against a specific version of the test vectors, they can link to a specific git ref, using the github API.
  2. TBDex already hosts testing-related things online. tbdex-kt is fetching JSON schemas of message kinds from tbdex.dev. There is already a precedent for making network calls while running tests. (IMO we should host these on GH as well, but that’s a separate issue).
  3. If an implementer needs to run tests offline, an implementation repo can cache the test vectors locally.

Recommendations for implementation repos

An implementation repo should NOT always fetch the test vectors in tbdex ’s main branch. Each repo should specify a specific version of tbdex to get a deterministic set of vectors. This will both 1) clarify what version of the TBDex protocol an implementation supports and 2) prevent confusion and CI flakiness if tbdex main gets ahead of the implementation.

It is also useful to add a flag for running a set of test vectors locally. I'm going to write a reference implementation in tbdex-js soon.

Generating Test Vectors

  1. Create a new test vector file in test-vectors/protocol
  2. Use an implementation of TBDex that supports the desired scenario to create the input and output if applicable.
  3. Run your test vectors against an implementation repo locally or push your branch of tbdex and get the URL of your branch on GH to use the new test vector in your implementation repo.
  4. Once you are confident that your test vector works as desired, open a PR on tbdex .

Types of TBDex test vectors

Protocol

In contrast to Web5, which deals with many types of data structures and actions, TBDex protocol implementations have one simple interoperability question to answer: do we accept valid TBDex messages and reject invalid TBDex messages?

Http server and client

TODO(diehuxx): More detail or a separate issue is needed. This is not as high priority right now.

Broadly, the client should create requests and accept responses whereas the server should accept requests and create responses in accordance with the spec.

Reference Implementations

tbdex -- hosting: https://github.com/TBD54566975/tbdex/pull/215 tbdex-js -- fetching from Github and caching efficiently: https://github.com/TBD54566975/tbdex-js/pull/125

nitro-neal commented 9 months ago

Thanks for taking the time to go over the current approach and presenting this.

I do go back and fourth on hosting vectors on sdk-development only and then having the unit tests do network calls.

Some thoughts I have on the proposed items:

One of the most important aspects for me is that we keep it as simple as possible so developers actually use this without slowdown and creates lots and lots of test vectors.

With the current approach developers can produce a feature, add a test vector locally, create a basically normal unit test locally and be on the way. (we can even make another process to update sdk-development automatically so they dont have to update this later)

Also, There are still a lot of cases where you will probably still be copy pasting and having a local test vector on the file system when manually creating and using these vectors when developing. So you will be doing things locally, then deleting and putting it on github.

To implement a new feature and new test vector you will have to repeditly update 2 repos, and then things can still get out of sync, you can have a situation where mainline tbdex pointing to a dev branch in sdk-development.

We have checks on if sdk-development test-vectors folder vs the other repos gets out of sync, and it will open PRs automatically if you have any concern about things getting out of sync.

I do think that web5- and tbdex- repos should use as the exact same approach as possible for test vectors to keep things consistent and uniform across repos if possible.

We did decide on having local with scripts to make sure things are in sync in meetings a while back, but super happy to revise if needed. If we truly want to have test vectors not be local and only in the sdk-development repo we can change a few lines in our current test vector impls:

Current local test vector impl:

  val testVectors = mapper.readValue(
    File("../test-vectors/presentation_exchange/create_presentation_from_credentials.json"),
    typeRef
  )

Remote test vector impl:

  val jsonUrl = "https://raw.githubusercontent.com/TBD54566975/sdk-development/main/web5-test-vectors/presentation_exchange/create_presentation_from_credentials.json"
  val request = Request.Builder().url(jsonUrl).build()
  val response = httpClient.newCall(request).execute()

  val testVectors = ObjectMapper().readValue(response.body?.string(), typeRef)

If the tbdex- repo is truly that different then the web5- then I guess we can have a split in the way we do test vectors, but I really think we should keep all test vector impls the same across all repos

nitro-neal commented 9 months ago

Just as a clean example on how the current process works:

new test vector: (2 files changed) https://github.com/TBD54566975/web5-js/pull/357/files#diff-dca2114c947eccff38e8509d221b1eeae0de4510f4081ab01588d528d34a0989R421

After merge we will get a new checkmark here (no action needed from dev): https://tbd54566975.github.io/sdk-development/

diehuxx commented 8 months ago

Made a few changes to my proposal based on feedback:

  1. Made error a boolean instead of an object.
  2. Don't compile test vectors into one file. Initially I thought that would be helpful so we only need to pull one file, but it sounds like it introduces more complication than it helps.
  3. Add recommendation for implementation repos to have a flag for using test vectors locally. I'm going to add a reference implementation of such a flag in tbdex-js.

~I'll be updating~ I have updated #215 accordingly.

diehuxx commented 8 months ago

Adding a vector branching strategy adds complexity

Agreed it adds complexity, but I think it's necessary. If the hosted vectors get out of sync with the implementation, implementation repos will want to pin which version of vectors the implementation supports. The GH action y'all mentioned sounds like it checks that test vectors in the impl repo are the same as in sdk-development/master. But if sdk-development/master gets ahead, we still want to ensure that the vectors in the impl repo are consistent with themselves.

diehuxx commented 8 months ago

Here are some reference implementations for how tbdex and an implementation could support this: tbdex -- hosting: https://github.com/TBD54566975/tbdex/pull/215 tbdex-js -- fetching from Github and caching efficiently: https://github.com/TBD54566975/tbdex-js/pull/125

The caching done in the tbdex-js branch makes a big difference. Fetching from github can take more than 500ms. Once it's cached, loading the test vector is a millisecond or less.

diehuxx commented 8 months ago

@mistermoe Pointed out to me that https://github.com/TBD54566975/tbdex-js/pull/125 is essentially reimplementing git submodules. So, I decided to try using git submodules and it's honestly way simpler than the stuff I was cooking up.

Instead of fetching test vectors dynamically from github, I'm now convinced that git submodules best allow implementation repos to have fast local access to specific versions of implementation repos. tbdex should still be the host of JSON schemas and test vectors. The change from my original proposal is in how they are consumed.

diehuxx commented 8 months ago

~Introducing tbdex-interop-suite. A repo which will contain everything that is currently in the hosted directory of tbdex. This dedicated repo is lighter weight for implementation repos to include as a submodule. I've updated https://github.com/TBD54566975/tbdex-js/pull/125 and https://github.com/TBD54566975/tbdex/pull/215 accordingly.~

Disregard. We decided in this Discord thread not to extract hosted stuff into a new repo

diehuxx commented 8 months ago

Implemented here and in tbdex-js and tbdex-kt with these PRs https://github.com/TBD54566975/tbdex/pull/215 https://github.com/TBD54566975/tbdex-js/pull/129 https://github.com/TBD54566975/tbdex-kt/pull/124