cloudstateio / cloudstate

Distributed State Management for Serverless
https://cloudstate.io
Apache License 2.0
763 stars 97 forks source link

[proto] having gRPC service names PascalCase'd and others. #511

Closed marcellanz closed 2 years ago

marcellanz commented 3 years ago

This PR addresses the gRPC servicenames under item a) from issue #518

marcellanz commented 3 years ago

We can merge this PR and left https://github.com/cloudstateio/cloudstate/issues/518 open with other issues parts tracking there.

marcellanz commented 3 years ago

@pvlugter I rebased the PR so that the recent CI changes are in effect for build runs here.

pvlugter commented 3 years ago

@pvlugter I rebased the PR so that the recent CI changes are in effect for build runs here.

Thanks, was just about to look at doing that. The native-image for cassandra smoke test seems to be flaky a lot — may need to restart for that. I'll need to follow up and figure out if we can make that more stable, or could be an issue with the native image itself.

pvlugter commented 3 years ago

Hmm, it's still timing out after the docker pull. And the native image TCK is failing, seems to be because that's pulling the image rather than using the local one. I'll need to fix this up some more still.

pvlugter commented 3 years ago

So just checking: this will be a breaking change for the protocol? The various generated language-specific APIs will be the same, but assuming the method names used will be case sensitive. It was failing with not implemented errors when inadvertently pulling an older image. If so, we probably want to consider bumping the protocol to 0.3 (and could be good to make any other breaking changes and renames to the protocol too).

pvlugter commented 3 years ago

And would be good to have a new node-support release without these changes.

marcellanz commented 3 years ago

this will be a breaking change for the protocol?

It seems so. Not for the part of Go as far as I can see, as there method name have to be upper-case. We might to let this PR as a Draft-PR and work on it from different language supports to make it compatible with the change.

We also might discuss other changes in the related issue #518. I listed a few points we could work on. I'll also add the use of a linter there.

sleipnir commented 3 years ago

If I'm not mistaken, each specific language code generator will follow your language conventions.

marcellanz commented 3 years ago

If I'm not mistaken, each specific language code generator will follow your language conventions.

If it can, yes I think. In case of Go lowercase methodnames are private, so generated service handler names are uppercase.

marcellanz commented 3 years ago

We should be able to lint that also. Buf is able to detect breaking changes https://docs.buf.build/breaking-overview https://docs.buf.build/tour-5

pvlugter commented 3 years ago

Yes, the code generators should still create compatible language-specific APIs. But the protocol itself will still be case-sensitive, and so this will be breaking change. Will be very useful if we can have buf check for this.

So the TCK integration tests are failing for this PR. JS/Node support and Java support pass, as they use the updated proto files, but Go support TCK is failing as it uses the old method names. It's failing during the initial discovery, which is just checking that the TCK implementation has started, so there's no error reported (because it's potentially spinning for a while with it being unavailable). I can update so that it reports the last error on timeout. It's failing with: io.grpc.StatusRuntimeException: UNIMPLEMENTED: unknown method Discover for service cloudstate.EntityDiscovery.

marcellanz commented 3 years ago

Thanks. From the TCKs run, the missing Go image for this PR lets it fail, because it uses the :latest image. The PR build of the Go support should build an image that then the main repository starts on a PR intergration run. Also, the image should get deleted afterwards. I'm not sure how the main repository knows about which support language image that would be for a certain PR if the depending supprt language has not yet merged the changes.

That complicated procedure could be unnecessary if the support library ensures that its :latest image has the changes a PR run triggered on the main repo checks then with the TCK. This limits the concurrent PRs open for a support library to one.

WDYT?

marcellanz commented 3 years ago

@pvlugter I pushed the cloudstateio/cloudstate-go-tck:latest image with the changes of this PR and re-ran the Travis Build, this went well: https://travis-ci.com/github/cloudstateio/cloudstate/jobs/479463683#L1317

I can't restart the CircleCI workflow.

pvlugter commented 3 years ago

Yes, having an approach for changes that affect the language supports would be good. I assume updating the cloudstateio/cloudstate-go-tck:latest with these changes would start failing all the other PRs though. I think it would also be fine if we disabled the Go support TCK in a PR like this, and then re-enable when there's a new image.

pvlugter commented 3 years ago

I've restarted the native image TCK in CircleCI.

We should probably still bump the protocol version for this, although with the discover method rename it doesn't actually get to the protocol compatibility check anyway. It's incompatible in both directions (older proxy with newer language support, or newer proxy with older language support). Not sure if the proxy will keep retrying — I changed things recently so that some exceptions are fatal for trying to connect to the user function. Would make sense to have this not stay in a retry loop.

pvlugter commented 3 years ago

I assume updating the cloudstateio/cloudstate-go-tck:latest with these changes would start failing all the other PRs though.

Case in point, the TCK integration tests for master are failing now that the image is updated:

[info] io.cloudstate.tck.TCK *** ABORTED ***
[info]   java.lang.AssertionError: No discovery after 60 seconds, last error: UNIMPLEMENTED: unknown method discover for service cloudstate.EntityDiscovery
pvlugter commented 3 years ago

Switching the TCK integration tests to a stable version of cloudstate-go-tck image in #530, which I think makes sense to do anyway. The latest tag can be used more experimentally. For a PR like this, I think it would be fine to specifically tag a docker image that's been updated with the relevant changes — could even be tagged with cloudstate-pr-511 or similar, and then removed later.

marcellanz commented 3 years ago

Its a hack. But for any TCK of a support library that the core project depends on, a change introduced like in this PR, results in this situation. Actually, all other released language support libs are broken with this branch, and if merged broken with the main branch of the core project, probably for a long time. Also, as you laid out, if this PR got merged and a PR-dependent tck image of the Go support would have been provided, the TCK CI run will start to fail until the same change got merged in the Go TCK image.

I assume its normal and expected that any TCK image of a support library that is not part of the core project, can and will fail on master or a PR.

The inclusion of the Go TCK into the core project was probably motivated to validate changes coming from the core project and we could remove the Go TCK again. Further, we can break up this entanglement by running the TCKs as not part of the core project separately to get an overview, like a matrix, of what is compatible and what is not. The core project if it envolves the protocol, never can catch up with failing support libs.

So I think lets remove the Go TCK.

pvlugter commented 3 years ago

Yes, we can certainly remove the Go TCK from the integration tests here. We really only need one reference implementation to use, and can run the TCK in the language support CI builds. I'm also fine with using a stable version, and then disabling whenever there are changes.

marcellanz commented 3 years ago

Switching the TCK integration tests to a stable version of cloudstate-go-tck image in #530, which I think makes sense to do anyway. The latest tag can be used more experimentally. For a PR like this, I think it would be fine to specifically tag a docker image that's been updated with the relevant changes — could even be tagged with cloudstate-pr-511 or similar, and then removed later.

Yeah, thought about that too. Perhaps we can mark a PRs text with the tag that can be used and let the CI run read that tag like: /go-tck:cloudstate-pr-511 or as one has to guess PR numbers something else. Although I don't know how this works with Travis or CircleCI, github actions probably has access to that. This way we could track compatibility of a change in the core project at least for the time a PR is open. Could be even just a dedicated github action just for this, as PRs are managed here. Once merged, the depending TCK will catch up as promised by the provided tag.

I can try to do that, separately to this PR.

pvlugter commented 3 years ago

Would be useful if a different tag can be given. Could also support skip comments then, if it's expected to fail and there's no image available yet.

marcellanz commented 3 years ago

@pvlugter could you/we change the TCK to start and configure the TCK with env variables like?:

TCK_SKIP = "java-tck,kotlin-tck" TCK_RUN = "go-tck:some_tck_tag"

sourced from a PRs body like:

It seems... and then:
"even if it's not directly related to any commercial opportunity".
...
But also, this fixes... with this PR.

/tck-run:go-tck:some_tck_tag
/tck-skip:java-tck
/tck-skip:kotlin-tck
pvlugter commented 3 years ago

Yes, should be straightforward to configure the TCK from env vars.