go-delve / delve

Delve is a debugger for the Go programming language.
MIT License
22.94k stars 2.14k forks source link

linux/riscv64 test failures #3832

Open aarzilli opened 2 weeks ago

aarzilli commented 2 weeks ago

The linux/riscv64 builder is not picking up linux/riscv64 builds. Looking at teamcity the problem seems to be this:

Unmet requirements: docker.server.version exists 
aarzilli commented 2 weeks ago

cc @lrzlin

lrzlin commented 2 weeks ago

According to https://github.com/lawrenceching/gitbook/blob/master/teamcity-is-reporting-unmet-requirements-docker.server.version-exists.md, I start the docker deamon, now it's up and running on the linux/riscv64 builder, and I could find the Linux-openEuler24.09-riscv64 in compatible agents. Maybe we could trigger the builds manually and see what happens?

aarzilli commented 2 weeks ago

Alright. The agent is picking up builds however there are some failing tests: https://delve.teamcity.com/viewLog.html?buildId=63555&tab=buildResultsDiv&buildTypeId=Delve_linux_riscv64_1_23

lrzlin commented 1 week ago
  • TestTraceDirRecursion: this does not concern me much, we can either disable it on linux/riscv64 or figure out where the CGO_CFLAGS are coming from, or just unset them somewhere on the test script

This seems to be related with async preempt mechanism of go, I tried disable it in delve but doesn't help, loong64 used to have similar issue, they finally fixed this by making CLs to the go compiler itself, so maybe we should do the same.

  • All the other ones TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 have the same root cause, the prologue skipping code isn't working, i.e. probably the prologues set in riscv64_arch.go are wrong. This also reminds me that we should make a CL for the go compiler, so that it sets the prologue end marker correctly for riscv64.

Yes, I found this should also related with the go compiler, I tried loong64's delve pr on official go1.23.2 and their custom go1.22.6, the custom 1.22.6 compiler could pass these prologue skipping tests but the 1.23.2 couldn't.

aarzilli commented 1 week ago
  • TestTraceDirRecursion: this does not concern me much, we can either disable it on linux/riscv64 or figure out where the CGO_CFLAGS are coming from, or just unset them somewhere on the test script

This seems to be related with async preempt mechanism of go, I tried disable it in delve but doesn't help, loong64 used to have similar issue, they finally fixed this by making CLs to the go compiler itself, so maybe we should do the same.

We could also simply adjust the test to ignore that message.

  • All the other ones TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 have the same root cause, the prologue skipping code isn't working, i.e. probably the prologues set in riscv64_arch.go are wrong. This also reminds me that we should make a CL for the go compiler, so that it sets the prologue end marker correctly for riscv64.

Yes, I found this should also related with the go compiler, I tried loong64's delve pr on official go1.23.2 and their custom go1.22.6, the custom 1.22.6 compiler could pass these prologue skipping tests but the 1.23.2 couldn't.

It should be possible to make those tests pass without changing the compiler. I'd check the disassembler output for the functions where the detection is failing. As it is, if this test fails several other tests will fail sporadically.

lrzlin commented 1 week ago

@aarzilli Hi, I found there are more tests failed after deteled the prologue related code, is that means at least it's partially correct?

Besides, by adding the prologue_end dwarf in src/cmd/internal/obj/riscv/obj.go like ppc64 and loong64, and TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 could pass. So if we could use this workaround instead of using the prologue detection code, I'll make a CL to golang as soon as possible.

aarzilli commented 1 week ago

@aarzilli Hi, I found there are more tests failed after deteled the prologue related code, is that means at least it's partially correct?

It could be. I compared it to the code in stacksplit and to what its generated by the compiler and I didn't see any resemblance but maybe I missed something. But a bad prologue detection code is going to make many of the tests in our suite fail non-deterministically so I would have to take samples before and after to be confident just by looking at test failures.

Besides, by adding the prologue_end dwarf in src/cmd/internal/obj/riscv/obj.go like ppc64 and loong64, and TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 could pass. So if we could use this workaround instead of using the prologue detection code, I'll make a CL to golang as soon as possible.

The prologue end marker is not a workaround, it is the proper way to do it.

lrzlin commented 1 week ago

Besides, by adding the prologue_end dwarf in src/cmd/internal/obj/riscv/obj.go like ppc64 and loong64, and TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 could pass. So if we could use this workaround instead of using the prologue detection code, I'll make a CL to golang as soon as possible.

The prologue end marker is not a workaround, it is the proper way to do it.

Add prologue_end DWARF stmt for riscv64: https://go-review.googlesource.com/c/go/+/620295

aarzilli commented 4 days ago

It seems like the agent is now offline?

lrzlin commented 4 days ago

It seems like the agent is now offline?

Sorry, the server was down due to maintainance, I've restart the teamcity agent.

lrzlin commented 1 day ago

Besides, by adding the prologue_end dwarf in src/cmd/internal/obj/riscv/obj.go like ppc64 and loong64, and TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 could pass. So if we could use this workaround instead of using the prologue detection code, I'll make a CL to golang as soon as possible.

The prologue end marker is not a workaround, it is the proper way to do it.

Add prologue_end DWARF stmt for riscv64: https://go-review.googlesource.com/c/go/+/620295

Merged.