accel-sim / accel-sim-framework

This is the top-level repository for the Accel-Sim framework.
https://accel-sim.github.io
Other
294 stars 114 forks source link

Jenkins workflow problem: Weird path of CI tests in the Jenkins server #273

Closed FJShen closed 7 months ago

FJShen commented 8 months ago

The CI test page for Accel-Sim PR #270 ends up under the path "Dashboard > accel-sim-tracer > accel-sim-framework > Pull Requests > PR-270" in our Jenkins server. This issue was first reported in PR #270

tgrogers commented 8 months ago

I am a bit confused by the issue here.

We have 2 tests:

One that tests rodinia + ubench on our existing cached traces: https://tgrogers-pc01.ecn.purdue.edu/job/Accel-Sim/job/accel-sim-framework/job/dev/

One that tests just rodinia on old traces and on by generating new ones and simulating them: https://tgrogers-pc01.ecn.purdue.edu/job/accel-sim-tracer/job/accel-sim-framework/job/dev/

What specifically is the "weird path"?

tgrogers commented 8 months ago

ok i found it:

https://tgrogers-pc01.ecn.purdue.edu/jenkins/job/accel-sim-tracer/job/accel-sim-framework/job/PR-270/1/display/redirect

The problem is not the tracer its the jenkins redirect, when I brought pc01 back up, I mapped https://tgrogers-pc01.ecn.purdue.edu to jenkins when it used to be https://tgrogers-pc01.ecn.purdue.edu/jenkins.

Will work on either fixing this in github and/or get apache to redirect traffic to "/jenkins"

FJShen commented 8 months ago

I was confused why the CI test for PR #270 ended up under the "Dashboard > accel-sim-tracer" path while many other PRs show up under the "Dashboard > accel-sim" path. I do not understand the difference of "Dashboard > accel-sim-tracer" and "Dashboard > accel-sim".

By the way, the link which you said you found (https://tgrogers-pc01.ecn.purdue.edu/jenkins/job/accel-sim-tracer/job/accel-sim-framework/job/PR-270/1/display/redirect) leads to a "Not found" page. I have reported a similar issue in Issue #272

FJShen commented 8 months ago

This is what I am talking about: image

tgrogers commented 8 months ago

OK - I think I fixed the not-found business in future builds by giving the server the right URL in the Jenkins configuration scripts. We won't be able to test this until we look at another PR. I am also not sure how exactly Github is picking between both of these builds for checkmarks... I cannot recall where I setup Jenkins stuff in Github and the GitHub interface has changed 1000 times since I did this.

tgrogers commented 8 months ago

This is what I am talking about: image

Yes, this is a valid build. We have 2 - please see my comment:

I am a bit confused by the issue here.

We have 2 tests:

One that tests rodinia + ubench on our existing cached traces: https://tgrogers-pc01.ecn.purdue.edu/job/Accel-Sim/job/accel-sim-framework/job/dev/

One that tests just rodinia on old traces and on by generating new ones and simulating them: https://tgrogers-pc01.ecn.purdue.edu/job/accel-sim-tracer/job/accel-sim-framework/job/dev/

What about the above confuses you?

FJShen commented 8 months ago

ok, I am relieved since you say accel-sim-tracer is a valid build. But now I am confused by another thing... All of PR-106, PR-112, PR-113, PR-138, PR-162, PR-167, PR-189, PR-231, PR-233 and PR-242 show up under both accel-sim-tracer and accel-sim, but PR-270 shows up under only accel-sim-tracer but not accel-sim.

Please compare these two webpages: accel-sim: https://tgrogers-pc01.ecn.purdue.edu/job/Accel-Sim/job/accel-sim-framework/view/change-requests/ assel-sim-tracer: https://tgrogers-pc01.ecn.purdue.edu/job/accel-sim-tracer/job/accel-sim-framework/view/change-requests/

tgrogers commented 8 months ago

Yup - this is weird. Honestly, I have lost track of how Jenkins and Github are integrated. At one point, I think I had to explicitly point Github to Jenkins somehow, but that does not seem to be the case anymore. They seem to be talking to each other, with Jenkins initiating the contact. This might mean that if jenkins doesn't pick that PR on the test in question fast enough we only get one test.... These asynchronous distributed systems are prone to these timing irregularities.

tgrogers commented 8 months ago

Jenkins complained about reverse-proxy problems until I fixed the busted URL in the config. It would be weird if this caused inconsistent behavior, but it might have. Do all our other PRs have 2 distinct Jenkins tests under the "checkmark" drop down?

FJShen commented 8 months ago

Before Junrui recently added the Github CI flow I remember seeing only one, and it gave no indication if the test was associated with accel-sim or accel-sim-tracer or both.

FJShen commented 8 months ago

I hope it not outrageous to say "accel-sim-tracer" is a misleading name, for I honestly thought it was a CI flow for the NVIDIA SASS tracer.

tgrogers commented 8 months ago

It is. The pipeline does both: generate the trace using the updated tracer and test it.

tgrogers commented 8 months ago

We (Mahmoud) wrote the tracer, it uses NVIDIA's NVBit to do it

FJShen commented 8 months ago

Oh... so here's my understanding - what is supposed to happen is on every PR of Accel-Sim, both the accel-sim CI flow and the accel-sim-tracer CI flow should be run. The former tests the trace driven simulator front-end and the latter tests the tracer which also belongs to the accel-sim-framework repo. Looks like the problem is sometimes not both of the CI flows are run for each PR (eg PR #270); and we cannot tell by just looking at Github's PR page if both were run.

Previously, when I clicked on the check mark for Jenkins, it only direct me to the accel-sim CI flow's webpage, so I didn't know accel-sim-trace was also needed.

JRPan commented 7 months ago

Jenkins deprecated. Replaced by Github Actions