Closed hseeberger closed 3 years ago
Following the good practices of OSS development and common sense there should be an open description here. Otherwise, it should be opened in some private LB repository.
@sleipnir sorry for that!
Actually everything should be covered here in open: https://github.com/akka/akka-persistence-spanner/issues/99
TL'DR: Bug in akka-persistence-spanner, fix in the RC5 release.
@hseeberger Thank you very much for the clarifications
@sleipnir, you're welcome. BTW, the PR does not pass CI and I cannot see how this should be related to my tiny change. If you have some time to take a look, that would be highly appreciated.
It looks like some layers of the 'cloudstateio/cloudstate-go-tck:latest' docker image were not downloaded in time.
@hseeberger To prevent this type of errors ([info] java.lang.AssertionError: timeout 60 seconds expired) I suggest increasing the test timeout time to double the current time
Thanks, @sleipnir. I am unsure if I should increase the timeout here in this innocent little PR.
Thanks, @sleipnir. I am unsure if I should increase the timeout here in this innocent little PR.
Obiviously, this change is orthogonal to what happens with the TCK for some time now. Checking the last couple of PR's shows the same issue over and over again, with changes unrelated to the failure in the CI run. As @sleipnir explained possible reasons for the CI run breaking, the explanation has nothing to do with this PR, obiviously.
Thanks, @sleipnir. I am unsure if I should increase the timeout here in this innocent little PR.
The suggestion by @sleipnir to increase a timeout is a general improvements suggestion, for sure not to be done with this PR but opening a another one and then try that out there.
I took some time, like @sleipnir did, to investigate this a bit:
Seeing the PR CI runs of the last few days, many failed like: https://travis-ci.com/github/cloudstateio/cloudstate/jobs/473092960#L1211 and one, in between, succeeded: https://travis-ci.com/github/cloudstateio/cloudstate/jobs/471982238#L1316
and a short test in a devcontainer where the TCK runs with no issues, I assume no functional issue here. This leaves the timeout suggestion by @sleipnir a good one we should try, where I'd prefer the option not racing with a timeout but getting the images before running the TCK.
What is not clear to me is, that despite a few re-runs of the travis job I did, always the fs layers of the same image (cloudstate-go-tck) got retries, which is no coincidence I think.
Thank you @marcellanz for your time analyzing this innocent TCK problem.
I opened an issue to address the question where I suggested, as you also suggested here, download the images locally before the tests run. But unfortunately I don't have time to elaborate a PR right now, but I think we left all the necessary information here on which way to go in solving this problem, right!? I believe that this should resolve the issue without having to change the timeouts for the tests, which should make other CloudState contributors more comfortable with their PRs.
Thank you @marcellanz once again for your detailed analysis.
@marcellanz about the layers that always undergo retries seems to me to be a problem in the Docker Hub Registry, probably these layers are common to many repositories.
@sleipnir Thanks, see my comment in the issue of yours: https://github.com/cloudstateio/cloudstate/issues/514#issuecomment-763605809 we can track this there. The CI run for this issue went green after a few tries.
@sleipnir Additionally, I saw the TCK timing out similarly when a language TCK did not respond the initial discover call by the proxy properly (with all entities expected). This happenend implementing one of those language TCK images and might be a bug. I'll check if this is still the case. Saying that, the main cloudstate branch always can have failed TCK runs in CI as the :latest
tags bring potentially effects of incompatibility when the main project and language supports diverge. Currently, I have not running the Nightly TCK runs anymore as travis decided to stop running them with out of free-time-credit issues. We might run them elsewhere.
Great job @marcellanz
I did nothing 🤷♂️ I only pushed CSS buttons 😆
I'll close this issue now, since #513 is merged and there's #514 for the docker pull timeout.
Thanks, @pvlugter, I forgot to close it 😬
See https://github.com/lightbend/cloudstatemachine/issues/2534 for details.