cdecker / lightning-integration

Lightning Integration Testing Framework
Other
74 stars 31 forks source link

Travis: Improve on #74 #66

Closed jtimon closed 5 years ago

jtimon commented 5 years ago

Try to improve on top of https://github.com/cdecker/lightning-integration/pull/74

jtimon commented 5 years ago

After some more "research" (ie after finding this link: https://blog.humphd.org/building-large-code-on-travis/ ), I launched my last try with https://github.com/jtimon/lightning-integration/pull/12 . Hopefully this will surpass the 4 MB log limit like in the blog simply by throwing it into files.

But running the thing on my own laptop (an expensive one with 6 cores or 12 hyperthreads) I get:

============ 96 failed, 72 passed, 2 warnings in 7166.64 seconds ==============
python cli.py postprocess

Unfortunately 7166.64 seconds / 60 = 119.444 minutes and it has another limit of 50 minutes according to the article, which I'm about to confirm myself, I guess (waiting for https://travis-ci.com/jtimon/lightning-integration/builds/120987322 ).

I don't know if we can pay travis to lift that 50 min limit, but perhaps we can just include the travis file and leave it failing under the time goes under 50 mins, which is a good goal independently of travis, I think.

Selfishly I would like it to leverage multi-core functionality more as a priority, since I have even more cores in my desktop pc, but I don't know how much that can help with travis.

jtimon commented 5 years ago

As expected, I got the error:

The job exceeded the maximum time limit for jobs, and has been terminated.

...which means we exceeded the above mentioned 50 min limit. This is still the best I got, so I did squash and push it here.

Alternatives welcomed, but I think having travis failing after 50 mins or earlier if there's some earlier error is better than not having travis at all. We can always "optimization: fix travis for the first time" at a later PR, I think.

Or we can just leave this PR opened until some commit after it passes travis.

Thoughts?

jtimon commented 5 years ago

Rebased on top of #74 and added a couple of commits which will hopefully help clarify what I explained before. There are a few arbitrarily chosen values newly introduced that are arbitrarily changed in https://github.com/jtimon/lightning-integration/pull/2

Bikeshedding for arbitrary values welcomed, don't let me decide on that, please. Removing arbitrary values by simplifying things is also fine.

jtimon commented 5 years ago

Rebased, but it doesn't seem to be triggering travis. I think you need to activate it for this repo.

NicolasDorier commented 5 years ago

I don't understand. Normally docker run should not return until the command execute, and it should give you the exit code of the command if it fails. Why looping through logs like that?

jtimon commented 5 years ago

The reason is travis times out after 50 minutes running. I should probably close it because it won't pass, but my initial approach is in https://github.com/cdecker/lightning-integration/pull/75

So this one isn't great, but, well if it doesn't pass running for, say, 10, 30 or 40 minutes, then it won't pass a full run either. so I consider this a strict improvement, even if it is possible that travis passes and you have still broke the build.

I'm honestly not sure what we should do, that's why I have both opened for now. Any suggestions welcomed.

NicolasDorier commented 5 years ago

Why don't you just use timeout command to start docker run?

jtimon commented 5 years ago

Because then the build will always look as failed in travis, won't it?

On Sun, Aug 11, 2019, 00:08 Nicolas Dorier notifications@github.com wrote:

Why don't you just use timeout command to start docker run?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/cdecker/lightning-integration/pull/66?email_source=notifications&email_token=AAHWGSSIXZANLBN55XVDKTDQD3KXZA5CNFSM4IHKOIU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4APHLY#issuecomment-520156079, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHWGSTCH4TWJ6PYBKU25GDQD3KXZANCNFSM4IHKOIUQ .

NicolasDorier commented 5 years ago

if the command finish before the timeout no

jtimon commented 5 years ago

Right, but we know it takes more than 50 minutes, that's why it always fails on travis even without any timeout. What my hack in this does is like what timeout does, except exiting with 0 when it exits by timeout. Otherwise we could use the other PR.

Also, I wanted to optimize the build so that we can some day get it under 50 mins. But there's open PRs for adding electrumX and I'm working to add the rust implementation too, so this will only take longer and longer to build the more implementations are added.

I reiterate, I don't really know what's the right thing to do about this 50min build limit in travis. Perhaps appveyor or some other thing would be better?

On Sun, Aug 11, 2019, 21:44 Nicolas Dorier notifications@github.com wrote:

if the command finish before the timeout no

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/cdecker/lightning-integration/pull/66?email_source=notifications&email_token=AAHWGSXU7OPS34ERJXK4DZ3QEACUHA5CNFSM4IHKOIU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4BAGXI#issuecomment-520225629, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHWGST2ZSHHEBVSXPOQ2O3QEACUHANCNFSM4IHKOIUQ .

NicolasDorier commented 5 years ago

except exiting with 0 when it exits by timeout.

if it exit by timeout the return code is not 0.

jtimon commented 5 years ago

Right, unlike my hack, which is what we want. If we have a travis that always returns failure, that's pointless. We want a travis that can sometimes return green, at the very least if you only change documentation and there's no way you have broken the current build. Using timeout, even a change in the readme will always fail travis.

On Mon, Aug 12, 2019, 15:28 Nicolas Dorier notifications@github.com wrote:

except exiting with 0 when it exits by timeout.

if it exit by timeout the return code is not 0.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/cdecker/lightning-integration/pull/66?email_source=notifications&email_token=AAHWGSTLMWLRX5TT7O26ID3QED7KTA5CNFSM4IHKOIU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4BVBQQ#issuecomment-520310978, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHWGSU2YKJKIBQ7TI7HLOLQED7KTANCNFSM4IHKOIUQ .

cdecker commented 5 years ago

Are you planning to run all tests on travis? In that case I'd suggest using the pytest-test-groups plugin to split the single run into multiple ones (test matrix) and run them slightly in parallel (pytest-xdist plugin) to make better use of the time we have.

c-lightning has a couple of configurations but they are grouped in groups of 3 that have the same params but test a different set of tests.

jtimon commented 5 years ago

Wouldn't this require to mosify the Dockerfile in a way you perhaps wouldn't like? I mean, otherwise I'm not sure how this could help the run of the CMD line in the Dockerfile to take less than 50 minutes. Perhaps we can have a different Dockerfile only for travis, even though then travis would really be testing the current Dockerfile, which is really what I wanted.

I guess not running all tests here isn't the end of the world, but at least getting to actually start running them (currently the 50 min travis timeout is trigggered during the "make clients" phase of CMD) on travis would be really cool IMO. If not possible, as said, I think this is at least an improvement over what we currently have.

On Tue, Aug 13, 2019, 02:31 Christian Decker notifications@github.com wrote:

Are you planning to run all tests on travis? In that case I'd suggest using the pytest-test-groups https://pypi.org/project/pytest-test-groups/ plugin to split the single run into multiple ones (test matrix) and run them slightly in parallel (pytest-xdist plugin) to make better use of the time we have.

c-lightning has a couple of configurations but they are grouped in groups of 3 that have the same params but test a different set of tests.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/cdecker/lightning-integration/pull/66?email_source=notifications&email_token=AAHWGSXTBRAGK2CJ55GBRITQEGM5PA5CNFSM4IHKOIU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4DH6HA#issuecomment-520519452, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHWGSSZJTAYLJ3ZOSMDGJLQEGM5PANCNFSM4IHKOIUQ .

cdecker commented 5 years ago

Not necessarily, you can run arbitrary commands through the docker image without having to modify the Dockerfile. You could for example have something like the following:

docker run lnintegration bash -c "make update clients; pytest test.py --test-group-count=10 --test-group=x"

and then simply vary the test group in the test matrix given to Travis CI. Though Travis not even managing to build the clients sort of precludes any further steps into that direction.

jtimon commented 5 years ago

Oh, cool, I didn't know about the test groups, but it makes total sense for this.

Though Travis not even managing to build the clients sort of precludes any further steps into that direction.

Well, I think I could try building the clients by pairs and pass the subset of clients that it must test as an argument to test.py, no?

jtimon commented 5 years ago

Closing in favor of https://github.com/cdecker/lightning-integration/pull/77 which seems better as a next step.