dustinmoris / CI-BuildStats

Little widget to display AppVeyor, TravisCI, CircleCI, GitHub Actions or Azure Pipelines build history charts and other SVG badges.
https://buildstats.info
GNU General Public License v2.0
171 stars 25 forks source link

Travis-CI build times confusing #16

Open tomato42 opened 6 years ago

tomato42 commented 6 years ago

I've started using the graph on tlslite-ng but the graph for Travis-CI build times is quite confusing:

Build history

From what I can tell, the time used for the graph is the wall clock time it took Travis to execute the job. Problem is that I have multiple repositories and if there are builds in parallel, not all 5 runners are available for this particular project, causing the real time execution to take longer.

Causing builds like #766 to take 31m28s while the total time was 1 h 24 min 27 sec and average time for individual jobs was about 3m.

If we compare it to #767 which supposedly took 7m36s, while the total time was 1 h 27 min 7 sec and average time of the job didn't change much – being around 3m.

Or the #772 which took 19m19s, but 1 h 26 min 52 sec total, with average around 3m still.

From development point of view, it's good to know if the build times are not increasing too fast (as that could be a performance regression), and for that, average build time would be most useful (as it wouldn't be affected much by addition or removal of environments), but it looks like even the switch to "total time" would be better as that is far less variable than the wall clock time of the jobs.

Would it possible to add a switch that would use either average job time or total build time for the graph?

dustinmoris commented 6 years ago

Hi, that is a great suggestion. Would you like to add a 4th line of the average job time or would you prefer to swap the existing numbers with job numbers?

tomato42 commented 6 years ago

the 3 lines should stay as they are, but the numbers they use should either come from "average runtime of the Travis Jobs" or "Total time of the Travis Jobs" - the "Ran for" line in Travis is what's problematic/usless

(what I found later, is that if you have a failure in a single job caused by environment, like failed network connection or docker provisioning, and you reschedule that job, then the time of the Travis task chagnes to the runtime of that single job – thus the dips in the graph above)

dustinmoris commented 6 years ago

Hi,

FYI - I was looking into fixing this issue but currently the latest TravisCI API doesn't expose any specifics about the individual jobs it ran for a build. It just exposes the clock time as you have mentioned.

I would have to additionally query the /jobs endpoint in order to retrieve the info for each job and then calculate the average myself.

While this is not very complicated it would considerably increase the additional load which BuildStats.info puts on TravisCI already. As a matter of fact the current rate limit by TravisCI already throttles BuildStats.info graphs for TravisCI projects during peak hours of US working hours, so I fear that this bug fix turns out to be a bit more difficult than originally anticipated.

I will see what else I can think of, but I thought I just let you know where I am with this at the moment and explain that there is a chance this might not get fixed that easily.

tomato42 commented 6 years ago

and are the values in duration consistent with started_at and finished_at?

maybe ask developers at Travis to expose that total time value (in contrast or in addition to wall clock value)? when I'm looking at the specific build, the total time shows up earlier than the individual builds, so I'm guessing it is not calculated from individual jobs client side

current rate limit by TravisCI already throttles BuildStats.info graphs for TravisCI

I'm quite sure you have though of it already, but as a user, I'd say that having an outdated graph is better than no graph, but then I'm just one user and all my projects don't average more than 1 or 2 merges to master a week

but I thought I just let you know where I am with this at the moment and explain that there is a chance this might not get fixed that easily.

very appreciated!