ga4gh / workflow-execution-service-schemas

The WES API is a standard way to run and manage portable workflows.
Apache License 2.0
82 stars 38 forks source link

Demonstrate implementations before merging schema changes #72

Closed david4096 closed 6 years ago

david4096 commented 6 years ago

When trying to work on schemas in this manner it is helpful to have a working implementation before making merging a schema change. For this we have the https://github.com/common-workflow-language/workflow-service , which is currently lagging behind the state of the schemas making testing and reviewing further changes a challenge.

Although we always hope to put off breaking changes, there have been a number in the last few merges. To help curtail this, I suggest we require a demonstrated implementation to go with schema changes. That does not mean the workflow-service decides which WES endpoints are developed. Just that schematic changes are in a way proven before being accepted here. Driver projects are still empowered to suggest changes and push issues forward.

This might allow us to come up with a release pattern so that semantic versions of WES can be easily associated with workflow-service versions.

This would be a change to the contributor guidelines and would require documenting a process of reviewing features in development branches of the workflow-service when reviewing schematic changes here.

@tetron @geoffjentry @dglazer

geoffjentry commented 6 years ago

Why is CWL being described as a reference implementation?

Also, I disagree. As someone who is actively updating an implementation it's easier for me to get these merged in quickly. As the list of stakeholders (and thus official voters) is quite small I'd rather err on the side of agility here. I could see a world in the future where WES is more stable, at which point I'd agree more - but at the moment we're really only talking about 2 implementations (Arvados & Cromwell) and a single client (the orchestrator written by @jaeddy ) whereas the changes are coming rapidly.

tetron commented 6 years ago

@geoffjentry where is workflow-service being described as reference implementation? (And to be fair, the original proposal / proof of concept work on WES started in that repo, so ...)

@david4096 I don't know if we're working at cross purposes here, but since WES is still in a pre-1.0 proposal state it is unrealistic to expect there won't be any breaking changes. There is also a basic dependency problem, that one can't have a branch ready with the updated spec until the spec has actually been updated. So I agree with @geoffjentry that we should be trying to iterate on the spec rapidly to try to get to a stable state quicker, and then have implementations catch up.

jaeddy commented 6 years ago

Rather than put the onus on contributors to demonstrate that changes work in a new branch of workflow-service (in a separate repo), I'd be more in favor of including some basic tests for internal consistency as part of the build/CI process. In other words, we can still remove the server/client from the codebase (as @david4096 proposed in #60), but generate each from the spec in Travis using either bravado or OpenAPI codegen. We'd need to write the unit/integration tests to check that the various calls work as expected, but I'd feel better about the treating the auto-generated implementations as "reference" than workflow-service (and it seems like that's an advantage of working with swagger/OpenAPI to being with).

In either case, I agree that this is more of a post-1.0 priority — but I'd also be OK with getting some of the infrastructure/process set up before then.

david4096 commented 6 years ago

@tetron not trying to avoid breaking changes, just making sure we have a working testbed for evaluating changes. Having a working branch of schema changes and a server implementation seems like a fair expectation to make, I'm not choosy about where or how its implemented, just that it can be demonstrated before asking others to implement it.

@geoffjentry I don't mean to assume CWL, it just happens to be where the most accessible implementation to build on is located. I think cromwell would also work to prove changes before their merged into master.

geoffjentry commented 6 years ago

@tetron It seemed like @david4096 was suggesting that that implementation not being up to date was a worse state of affairs than any other particular implementation, which to me is only true of a reference impl. Either way, I likely misinterpreted.

I do like what @jaeddy proposes about having autogen'd code testing sanity of wiring. Beyond that, I have a distinct loathing of reference implementations in general, and even beyond that wouldn't want to trust relying on a single "blessed" implementation for tests. Ultimately that is the sort of thing the testbed can provide as that will have multiple implementations represented.

@david4096 Again, I'll say that I don't think it's a bad idea in the general case, however for the here and now it seems like a distraction (not to mention, 3 of the 4 people involved in the here & now work have said it's not needed)

david4096 commented 6 years ago

I don't want to hold up any active development, and eventually arriving at consistent implementations via a single client seems like a good way to focus development. @geoffjentry is this it: https://github.com/DataBiosphere/job-manager?

We recently added https://github.com/ga4gh/workflow-execution-service-schemas/blob/develop/python/test/test_package.py which tries to at least test swagger validation (which was previously absent). Instead of a reference implementation, some demonstration of a feature or endpoint before merging would certainly be instructive. If there were at least a client implementation, here, or elsewhere, before a merge... that would be an improvement!

Thanks @DailyDreaming and @achave11 we are on our way to getting caught up with the current state of the schemas in the workflow service. First, we are demonstrating how the multipart upload feature works by providing tests in the workflow-service. Next, we want to demonstrate the workflow_engine_params additions, which should make interoperating with different execution engines easier. https://github.com/common-workflow-language/workflow-service/issues/38

@geoffjentry you're welcome to close this, glad to hear there is a lot of agreement in our small team. Instead of closing you might change to "provide a client demonstration before merging schema changes". If there are other synthetic tests of the openAPI description that would be instructive, that might help as well, but without programming interaction patterns its a bit hand-wavy.

jaeddy commented 6 years ago

Hey @david4096 — for what it's worth, I'm hoping to change things up a bit with the testbed orchestrator a bit and use autogen'd clients from the swagger specs, rather than my own home-cooked versions. In the process, I would like to set up some tests for client/server interactions for both the TRS and WES APIs, as well as the interaction between them (and some of the other business logic in my code).

I haven't sat down and started to code any of this up yet, but I imagine it could eventually accomplish some of the checks you're suggesting for specific versions of the spec. If/when I get it working, I'd obviously be happy to merge some of that in with what's currently in test_package. Likewise, I'll definitely be keeping an eye on what @DailyDreaming and @achave11 are doing with workflow-service testing to get ideas. :)

david4096 commented 6 years ago

I began work on a mock server using connexion in these schemas https://github.com/ga4gh/workflow-execution-service-schemas/blob/develop/python/ga4gh/wes/server.py . The bravado client here should be able to be used to demonstrate API interactions like the ones you want without having to implement a backend (using the pattern you implemented in workflow-service).

We could do it here, or in the workflow-service, but in the absence of any tests/features I proposed https://github.com/ga4gh/workflow-execution-service-schemas/issues/60 to remove them from here.

geoffjentry commented 6 years ago

@david4096 re client, IMO the definitive client is the orchestrator being developed by @jaeddy

jaeddy commented 6 years ago

Based on discussions in the Cloud WS, this functionality will be handled by a separate repo that runs a suite of conformance tests for API implementations as well as the cloud interoperability testbed. Until we have a comprehensive reference implementation, we won't include this as a build criterium.