Open amshinde opened 6 years ago
@chavafg Can you take a look at this?
I'm guessing this should really have been raised on https://github.com/clearcontainers/jenkins.
/cc @grahamwhaley as this might have implications for the metrics system storage requirements.
We should probably discuss and define which logs, and how much debug they have in them.
If we take all the system logs and have all the CC debug enabled in the toml for instance then the logs come out pretty big (100's of Kb iirc), which we may not want to gather and store for every run.
If we know what info we want in advance, then we could run some commands at startup such as cc-runtime cc-env
, docker info
and @jodh-intel 's magic system info collection script. We could even run all of those to gather into a file and add the file to the stored 'results archive' in Jenkins, which would help reduce pollution in the console output screen/log.
@chavafg I think it was recently pointed out that the metrics CI logs were already pretty big, and I should check that, as that is not intentional.
For reference, that magic script is https://github.com/clearcontainers/runtime/blob/master/data/collect-data.sh.in.
@amshinde - can you give a concrete example where retaining logs would have helped? I'm not disagreeing that it's a good idea, but it would be good to explore if there are other ways to give you what you want.
How long do we think we'll need to store logs? "Forever" probably won't cut it so would a month (4 releases) be sufficient do you think?
But as @grahamwhaley's suggesting, I'm not sure we need to keep the logs as long as we can know the environment the tests ran in, to allow a test run to be recreated, namely:
rpm -qa
/ dpkg -l
).As denoted by the checkboxes, the collect-data.sh
script captures almost all we need here. The package set is the only missing item (although the script does capture the versions of any CC packages installed on the system already).
For reference, the output of the collect script when gzip -9
'd is ~6k (for a system without any CC errors in the journal).
If we decide to store full logs for all PRs, we'll need something in place to warn about the ENOSPC
that is almost guaranteed to happen one day... :smile:
Oh - we might also want to include procenv
output (see https://github.com/clearcontainers/jenkins/issues/5) for things like system limits, etc.
Agree on logs and longevity - I'm going to presume Jenkins has some plugin or setting that can manage and expire the gathered results files - and we should look at that indeed (we do collect up the .csv results files for the metrics for instance at present, but do not expire them)
procenv was the magic I was thinking of :-)
Ah - soz - so much magic about! ;)
I think @amshinde concern is to know the agent version, which at some point last week, we had a wrong version testing latest PRs. As for keeping the logs, I can add a rule to gather them in the Azure jenkins configuration, that way the metrics jenkins will not have any impact. But also the azure jenkins server may have storage issues in the future if we continue growing the logs we keep on every run. As @jodh-intel and @grahamwhaley said, it would be better to gather the information we require instead of getting all the logs from the execution.
@chavafg - we could just run cc-collect-data.sh
in the teardown script couldn't we? That way we get what info we want but also we ensure that script is being run regularly. If we need the complete list of packages, it would be easy to add an extra --all-packages
option or similar.
@jodh-intel yes, I think that would be the best. does cc-collect-data.sh
collect the agent version? Because I have seen that it appears as unknown.
[Agent]
Type = "hyperstart"
Version = "<<unknown>>"
@chavafg - good point! No, it doesn't.
I've had a think about this and I can think of two ways we could do this:
We could capture the agent version by adding something like a "--full
" option to cc-collect.sh
script. That option would run as normal, but would then:
cc-collect-data.sh
to run:
sudo docker run --runtime cc-runtime busybox true
But it's a hack ;)
Change the runtime so that it loop-mounts the currently configured container image read-only (with mount -oro,noatime,noload
(thanks @grahamwhaley)) and then run cc-agent --version
and grab the output.
That seems liked the best option but wdyt @grahamwhaley, @sboeuf, @sameo?
I had sometime very recently also considered we could loop mount the .img
file and run the agent on the host with --version
to extract that info. Either we can do that in the collect script or have the runtime do it. In the runtime feels a little skank, but I guess then we could in theory add the info into cc-env
.
I was having similar feelings about having that sort of code in the runtime too. That said, we do sort of have precedent if you look at cc-check.go
which calls modinfo(8)
.
I'm happy for us to have this purely in the collect script but, yes, if it doesn't go in the runtime, we need to remove the Agent.Version
field that @chavafg highlighted as currently it's static.
@chavafg @jodh-intel @grahamwhaley Gathering the agent version was one of the requirements I had in mind, as we were running with a wrong agent last week. What I really wanted to have a look at were the CRIO logs, to take a look at the lifecycle events in the log and see that the container storage driver passed is actually the one being used with crio. I would say for successful builds, one would be interested in the logs typically just after the build, so I am ok with keeping this around for a week or even just for a couple of days.
It looks like in the Jenkins 'discard old builds' option we may also have the ability to specify how long to keep artifacts for btw.
Currently we can only retrieve the logs when a build has failed with Jenkins. We should be able to retrieve them for successful builds as well to able able to inspect if we are running with the correct environment.