Fix CI - Githubissues

pzmarzly commented 5 years ago

Closes #206

kcov is allowed to work silently for 30 minutes now (it worked fine on 20, too)
kcov output is being timestamped, to see how long each action takes
Travis badge links to <Travis>/branches

realcr commented 5 years ago

I think that the two tests:

[57:28.708558] test tests::tests::test_funder_basic ... test tests::tests::test_funder_basic has been running for over 60 seconds
[57:28.726961] test tests::tests::test_funder_forward_payment ... test tests::tests::test_funder_forward_payment has been running for over 60 seconds

are running for a very long time for some reason. Maybe the problem isn't with kcov, but with those two tests (Running under kcov)? I conjecture that maybe in some cases a deadlock happens? I remember the tests passed on my machine, maybe I need to check it again.

EDIT: I will be back to try stuff on Thursday. If you manage to work on this before I get to it, maybe you can try adding some prints to the two tests (Running the tests with --nocapture) just to check if those tests are stuck somewhere when they are being run using the CI. (Maybe you can even manage to reproduce this issue on your machine if you run those tests enough times?)

It is not unlikely that there is a bug in those tests, as I ported them very recently to the new atomic payments design. I could have made some kind of synchronization mistake there.

pzmarzly commented 5 years ago

They do run fast, but not under kcov. I love when same code passes CI in PR on fork, but not in PR on upstream, probably cache issue. I'm going to do more testing, then.

Speaking of these 2 functions, they did not take long on this CI build, but did take >5 minutes on this one (ignore pstree printed every 10 seconds). Thus, I think that maybe kcov or cargo-test is caching some results, so the reason why it's failing now may be that too much code changed at once, and testing all code is too much (even for 50 minutes upper limit on Travis). That's troublesome if true.

realcr commented 5 years ago

You could be right that this is related to some kind of caching, but I have to admit I'm not sure. What I find strange that only those two tests always seem to be slow or stuck. Therefore I suspect that maybe I made a mistake in how those tests work. By the way, those are the "most async" tests in the whole offst-funder module.

If those two tests turn out to do something strange it will probably be up to me to fix it, but you might be able to detect it by adding a few prints in a few places in the tests code and finding out where those tests are stuck. If they are always stuck in the same place, maybe this could give us some clue about how to fix them. (Be sure to add -- --nocapture to the cargo test command if you want to try this out, otherwise you will not see the prints going to stdout).

To be sure, I just deleted all off Travis caches for this respository (through this page: https://travis-ci.com/freedomlayer/offst/caches).

pzmarzly commented 5 years ago

I'm aware of the option to clear cache, I did it before. If there is any problem with it, it'd be that there's too little of it.

I now noticed I was working on old branch (did not have #204 merged). While #204 looks innocent, it may change something.

I'll try adding -- --nocapture. I'm going to play with kcov settings a bit. I will let you know if I need any help at some point.

BTW if kcov turns to be slow and nothing could be done about it, there is https://github.com/mozilla/grcov , with Rust guide

pzmarzly commented 5 years ago

Huh, it works now. I don't know what exactly helped, but IIABDFI. Dots are printed every 10 seconds, only 2 tests are suspiciously slow: https://travis-ci.com/freedomlayer/offst/jobs/205790130

realcr commented 5 years ago

@pzmarzly : Thanks for working on this! I just attempted to fix this issue by finding out what is the reason for the two slow tests. I made the following two commits:

On PR #205 in attempt to solve those issues. The first commit changes the async funder tests to use a TestExecutor instead of the nondeterministic ThreadPool executor. The second commit removes the "polling" loop waiting for a Receipt:

    // 0: Expect a receipt:
    let (receipt, ack_uid) = loop {
        await!(
            node_controls[0].send(FunderControl::RequestClosePayment(PaymentId::from(
                &[2u8; PAYMENT_ID_LEN]
            )))
        );
        await!(test_executor.wait());
        let response_close_payment =
            await!(node_controls[0].recv_until_response_close_payment()).unwrap();
        match response_close_payment.status {
            PaymentStatus::Success((receipt, ack_uid)) => break (receipt, ack_uid),
            _ => {}
        }
    };

With a loopless version, using the test_executor.wait() feature:

    // Wait until no more progress can be made (We should get a receipt)
    await!(test_executor.wait());

    // 0: Expect a receipt:
    await!(
        node_controls[0].send(FunderControl::RequestClosePayment(PaymentId::from(
            &[2u8; PAYMENT_ID_LEN]
        )))
    );

My guess is that this kind of loop might cause contention, and in some cases might take a very long time to finish. All is left to do now is wait for the CI to tell us his thing (:

pzmarzly commented 5 years ago

I see you restarted Travis build, and it failed this time. I guess no amount of kcov tweaks can help when the tests sometimes hang indefinitely for some reason.

Though I think that this PR could be merged, additional debug info could be useful. Plus, the kcov exclude were updated here to match ones from https://sunjay.dev/2016/07/25/rust-code-coverage

realcr commented 5 years ago

Though I think that this PR could be merged, additional debug info could be useful. Plus, the kcov exclude were updated here to match ones from https://sunjay.dev/2016/07/25/rust-code-coverage

I am willing to merge this if we can have the modification you are talking about in a new PR, and if you can explain what this modification does:

--exclude-path=/usr/include \ -> --exclude-pattern=/.cargo,/usr/lib,/usr/include \

All this codecov thing looks like black magic to me (:

freedomlayer / offset

Fix CI #207