Open josharian opened 5 years ago
Before genssa, we have:
v8 (+21) = InlMark <void> [1] v1
v19 (+17) = InlMark <void> [2] v1
v14 (22) = MOVSDconst <float64> [+Inf] : X0
After genssa, we have:
v8 00006 (?) NOP
v19 00007 (+17) XCHGL AX, AX
v14 00008 (+21) MOVSD $(+Inf.0), X0
Yes, there's something odd about the inline mark removal which might need fixing. Somehow instead of removing both marks, it merged one into a real instruction (v8->v14) and maybe that forced it to keep the other one (v19)?
The main bug I'm not surprised about. There's no other place to get position information other than the first instruction. Except maybe the return instruction (if there is one)?
Another place to look for position information, at least on amd64, is any trailing (padding) INT $3 instructions.
But changing the heuristic to look first for a RET and then at the first instruction if no RET seems like a good place to start.
Hello, I'm Joshua and still very new to go.
I was able to get this working in about 8 lines of code. Here is the process I took.
disasm.go Inside of the Print method, which starts on line 186:
I used an additional instance of tabWriter due to the calls to Fprintf inside of the callback that is passed into Decode. The enclosing Print method depends on the io.Writer interface, so the changes are more involved than I wanted to get into without first reaching out. Please let me know if the above makes sense or the direction I should be taking for the fix.
Thanks for looking into this!
What is the performance hit if you simply call Decode twice, first just to find the RET (if any), then a second time to do all the printing? (By the way, any RET anywhere in the function should work. It doesn't have to be the final instruction, and it often won't be.)
This is a better suggestion. It drops the amount of changed lines and preserves the order to the output to match the order of the logic in the code. Thanks for pointing out the RET code could be anywhere in the instructions, I was dialed too much into the example and didn’t think about that.
The one caveat is that I’m not sure how to short-circuit the call to Decode because the function is executing a callback which prevents adding a goto label due to scope?
The execution time is linear, so I don’t think decoding the symbols twice will add that much overhead unless there’s something happening at a lower level that I am not aware of. I’m going to add some profiling code and run a few performance tests on the algorithm as it will be a good learning experience.
Sounds good. Let me know if I can be of help.
As to short circuiting, I think you’ll have to add explicit support for it, e.g. by having the function arg to Decode return a bool indicating whether to stop decoding. You could start without that and add it in as a second step, since it is an optimization.
I was as able to familiarize myself with pprof this morning.
There is a 2x slow down when invoking the function twice to get the ret code. Adding the short circuit logic and re-running the profile on runtime.a cuts the additional overhead in half, which is probably close to average case. I have a fast laptop, but the results were measurable in seconds, which is enough justification to include the short circuit logic since it’s already written down. I’ll submit these changes and leave it up to code review going forward.
Change https://golang.org/cl/182758 mentions this issue: cmd: Correctly report the source file of an assembly
Thanks for all the help Josh
Note that
cmplx.Inf
is listed as being in$GOROOT/src/math/bits.go
. It looks like cmd/objdump may assume that the first instruction in the function is located within the function. With inlmarks in a prologue-less function, at least, this assumption is false.Also, why is that inlining mark there in the first place?
cc @randall77 because this is an unintended consequence of inlmarks cc @dr2chase for all things position