dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.98k stars 4.66k forks source link

Implement stack unwinding and exceptions for Linux #3942

Closed kangaroo closed 4 years ago

kangaroo commented 9 years ago

@janvorli I know you're working on this, but I'd like to request that this work be done on a feature branch in the public if possible. Its the next major feature needed for me to bring OS X up to snuff, and I'd rather not duplicate effort.

Thoughts?

sergiy-k commented 9 years ago

@kangaroo That’s actually our ultimate goal - to do as much work in the public as possible  especially if this work is related to the cross-platform stuff. We do want everyone to be able to monitor the progress and provide comments, suggestions and contributions. So in the next few weeks you should see more and more work happing on GitHub. In addition, in the nearest future we will be opening new issues (work items) that will describe what we are currently working on and what we plan do next. These work items will vary in size and the ones that we are ready to accept PRs for will be marked ‘up-for-grabs’.

@janvorli can provide you with more detailed information about stack unwinding work if needed but here is a brief description:

  1. Stack walker for managed code. JIT will generate regular Windows style unwinding info. We will reuse Windows unwinder code that we currently have checked in for debugger components for unwinding calls in managed code on Linux/Mac. Unfortunately, this work requires changes in the runtime that currently cannot be tested in the CoreCLR repo so it is hard to do this in the public right now. But we are working on fixing that because, as I mentioned at the beginning, our goal is do most work in the public.
  2. Stack walker for native code. Here, in addition to everything else, we need to allow GC to unwind native stack of any thread in the current process until it finds a managed frame. Currently we are considering using libunwind (http://www.nongnu.org/libunwind) for unwinding native call stacks. @janvorli did some prototyping/experiments and it seems to do what we need. If you have any experience with this library or have any comments/suggestions please let us know.

Thank you.

kangaroo commented 9 years ago

@sergiy-k If this is your goal, why is this issue closed? I actually want to work on stuff, but you're basically saying:

  1. We'll be better in the future
  2. We've looked at it
  3. Please internal people document

If prototyping of dotnet/coreclr#2 is done, where is the feature branch?

Thats a lame reason to close what is a valid issue against the current codebase. The issue isn't fixed. Does stack unwinding exist on linux? No. The fact that its 'hard to do in the public repo' doesn't mean it doesn't suck. I wan't to contribute and you're throwing up barriers. :(

sergiy-k commented 9 years ago

@kangaroo I’m really sorry about that. I have not realized that I closed the issue. I certainly did not want to do that. It was an operator error. :) I was just about to assign the issue to @janvorli when I saw your reply. I’m reopening the issues. Once again, I’m sorry about confusion.

kangaroo commented 9 years ago

@sergiy-k Thank you for the clarification, but this is case and point why to be effective this needs to be developed in the open. Take a small snippet from libunwind.h in OS X:

/* 
 * traditional libuwind "remote" API 
 *   NOT IMPLEMENTED on Mac OS X
 *
 * extern int               unw_init_remote(unw_cursor_t*, unw_addr_space_t, thread_t*);
 * extern unw_accessors_t   unw_get_accessors(unw_addr_space_t);
 * extern unw_addr_space_t  unw_create_addr_space(unw_accessors_t, int);
 * extern void              unw_flush_cache(unw_addr_space_t, unw_word_t, unw_word_t);
 * extern int               unw_set_caching_policy(unw_addr_space_t, unw_caching_policy_t);
 * extern void              _U_dyn_register(unw_dyn_info_t*);
 * extern void              _U_dyn_cancel(unw_dyn_info_t*);
 */

Perhaps your plans need this. Perhaps they don't. How can I know when you develop in the shadows? If the plan is to be open, and come to OS X (as you've documented), let people like me get involved early to avoid poor decisions. This issue is not the only example of this problem.

Microsoft is very close to going down a linux centric path with their porting efforts, which will dig a hole that we all need to get out of in the future. If OS X isn't a priority for MS yet, at least be open so the community has an opportunity to help you prevent future pitfalls.

sergiy-k commented 9 years ago

@kangaroo :) Thank you! :) The issue you have just pointed out is exactly the reason why I asked you about your opinion about libunwind.

kangaroo commented 9 years ago

@sergiy-k Great! Where's the feature branch so I can port it, debug it, and support it on OS X before linux support is merged by dotnet-bot? :)

sergiy-k commented 9 years ago

@kangaroo ... and I almost hit the "Close and comment" button instead of "Comment" again!

kangaroo commented 9 years ago

@sergiy-k Here's my point, if it wasn't abundantly clear. People want to contribute. Even branches which are 'hard'. You guys need to figure them out, otherwise you're going to lose contributors. No one wants to work on something already in progress, and no one wants to chase features that get dropped from a black hole (ref; android). Let us help.

richlander commented 9 years ago

Hi @kangaroo. I'll provide a bit of context, which might help.

We made the decision to focus almost entirely on getting the code out. We definitely wanted it to build and have some initial tests, but that was the only bit of icing on top of just getting the code open. It meant that we hadn't really developed a plan on how to collaborate with the community. You'd think that we would have learned from our experiences with corefx (we're just one team) and do better. We honestly didn't expect this degree of participation/contribution on coreclr. We're very thankful and impressed, but scrambling a bit to keep up. We'll get to steady state relatively soon, but we're not there yet.

Some of us are also new to GitHub and developing a new muscle memory for using it and also learning the culture. You'll have to excuse @sergiy-k for not being able to distinguish green and grey buttons. Joke.

There is a very clear expectation from "management" that we over-index on community collaboration. They want us to publish as many of our work items as possible. "Working in the shadows" is very much the anti-pattern. They want us working in visible feature branches and all that good stuff.

We do have a bit of a bias towards Linux (it's a server platform), but we'd love to see OSX come up at the same time. I very much hope collaboration and coordination can enable that to happen.

sergiy-k commented 9 years ago

@janvorli, may I please ask you to create a feature branch and check in your code in there. Thank you!

kangaroo commented 9 years ago

@richlander Appreciate the comment.

To start things off, I'm not new to the CLR or contributing. I get the problems, I've been there @novell. I really want to help out here. That said:

I appreciate the focus on getting the code out. It's awesome. I'm also sympathetic to new teams coming up to speed. I also don't blame @sergiy-k at all.

All that said, given your statement, why is such a fundamental problem as unwinding and exception support still 'in the shadows'?

You guys are learning, great. Please don't take this issue as anything more than frustration. Your team member told me:

https://twitter.com/geoffnorton/status/564849521314631680

So I tried. I know its new guys, but defer to open.

richlander commented 9 years ago

Thanks @kangaroo. Yes, we will defer to open.

See @sergiy-k request for another dev, @janvorli to move to a feature branch. That's goodness. We'll get more code in feature branches.

I think I need to order a t-shirt for @sergiy-k!

kangaroo commented 9 years ago

@richlander Oprah moment. Shirts for everyone!

richlander commented 9 years ago

@kangaroo - this could happen!

kangaroo commented 9 years ago

@richlander Let it rain!

janvorli commented 9 years ago

@kangaroo I am sorry for not being much responsive these days, I am fighting some kind of nasty virus or bacteria. I'll create the feature branch for this work so that you can work on the Mac stuff in parallel with me. I'll migrate the work I have done till now from TFS to GIT in this branch and let you know once it is ready. My current work has two parts, as @sergiy-k has already mentioned. The windows style unwinder that will be used for the jitted code and Unix unwinder for native code that uses the libunwind's low level unw_xxxx functions like unw_step etc. With these changes, the stack walking that is used by the GC and the stack walking used by the System.Diagnostics.StackTrace should work. It is likely that we will be able to use the Apple's libunwind on OSX instead of the one from http://www.nongnu.org/libunwind. We were originally considering using the libunwind that's part of the clang's compiler runtime and that I believe is essentially the same as the one in OSX (guessing from the fact that it was donated to the LLVM project by Apple). But the issue with this one is that in LLVM, it is currently not a standalone library (even though there is a discussions currently going on to change that). So we would have to pull out its sources and build them ourselves and that would take quite some time to get approved by our legal and corporate affairs department. So we have opted for the other libunwind that we can use as a binary. The actual exception unwinding work will be the next step. I'll prepare a description on how that's supposed to work in the world where native stack frames are interleaved with the stack frames of jitted code on the stack. That should serve as a foundation for our future cooperation.

kangaroo commented 9 years ago

@janvorli Sounds great. Hope you feel better soon! lib unwind from apple should be able to work for the native unwinding -- I agree.

janvorli commented 9 years ago

@kangaroo I have created a feature branch off the dotnet/coreclr, named unix_issue177 to continue working on the stack unwinding in public. I will soon move a set of changes there so that you can help me to make changes necessary to use the existing libunwind on Mac. FYI, the changes that I have together with the dotnet/coreclr#259 allowed me to dump a correct managed code stacktrace using the System.Environment.StackTrace

richlander commented 9 years ago

Yeahh! @kangaroo, this should make you happy. More of these will be coming.

What do you think of our branch naming pattern?

kangaroo commented 9 years ago

Very happy @richlander :)

As for branch naming, tying things to issues is always ok, but descriptive names help so people don't have to refer back to issues. Something like:

'linux-stack-unwinding' is pretty obvious :)

richlander commented 9 years ago

@kangaroo - we can do that, too. We talked about that. Maybe we'll try the next one that way.

We've got the "linux" part, so we're part way there.

janvorli commented 9 years ago

@richlander Actually, I've named it unix_ because it will be for Mac too.

richlander commented 9 years ago

@janvorli Good point! It's all good.

khellang commented 9 years ago

Just an FYI: Wit git in general, it's pretty common to use "group tokens" in branch names and separate them with /, like feature/unix-stack-unwinding (this is clearly a feature branch), bug/177, issue/177, release/1.0 etc. By having some conventions like these, it makes it much easier to keep track of branches :smile:

See this SO answer for some details.

landonf commented 9 years ago

@janvorli FWIW, I'm happy to take on any tasks or answer any questions I can about native Mac stack unwinding -- I've previously implemented an async-safe DWARF expression interpreter, FDE/CFI lookup, compact unwind table support, etc for PLCrashReporter's (MIT-licensed) Mac/iOS unwinder -- just about everything except LSDA handling (although that will likely be coming soon).

janvorli commented 9 years ago

@landonf Thank you! Actually, the missing piece in the Mac libunwind is an API to get context pointers, which are pointers to stack locations where registers are stored in the frame (provided they are stored there). The GC uses that when it moves objects pointed to by registers. But it seems we will have to live with the fact we don't have an API for that on Mac and rather implement an workaround in the coreclr.

landonf commented 9 years ago

@janvorli I have some MIT-licensed code I could repurpose (and clean up to match local requirements) that would be suitable for computing frame offsets based on local decoding of the compact unwind and DWARF data:

However, I couldn't find licensing requirements in the contributor guidelines, so I don't know what's kosher in terms of contribution licensing.

landonf commented 9 years ago

@janvorli also, if you need this from the GC, do you also need async-safety? (re: https://github.com/dotnet/coreclr/issues/193#issuecomment-74016281)

janvorli commented 9 years ago

@landonf Could your code be used somehow as an addition to the standard OSX libunwind used only for getting the register offsets or would it have to replace the libunwind completely? Regarding the async safety, I'm not sure. @jkotas, how does walking stack of a running thread for a GC work? I know we have a barrier between the managed / unmanaged code that prevents entering the managed code if the GC is scanning the managed part of the stack and native code is currently running. Does it mean that the GC pointers in registers would never be stored on the stack above (if you view the stack as growing upwards) the frame representing the function with the barrier?

Nassiel commented 9 years ago

Hi, I found that while using CentOS 7 and using the latest push at this moment on master my cmake building doesn't find libunwind.h

I already install it and it's located at /usr/include/libunwind.h (with x64_86 version), it's the typical include location and I'm pretty sure that cmake search there. Cmake told me about using libunwind8, to my this have no sense over CentOS 7 there are just one libunwind-devel package version 1.1-3.el7.x86_64.

Also I tried to download libunwind git project, and make installed it with the same result.

Any idea?

janvorli commented 9 years ago

The cmake's check_include_files basically just tries to compile the following:

#include <libunwind.h>
int main()
{
    return 0;
}

So it looks like include paths don't contain the /usr/include folder or the header for some reason refers to a symbol that is not defined. You can look into the CMakeError.log, there should be info on what has exactly failed. It should say something like "Determining if files libunwind.h exist failed with the following output:"

Nassiel commented 9 years ago

I did, here is the error and does not related with libunwind.h, is related with ucontext.h:

/usr/include/sys/ucontext.h:137:5: error: unknown type name 'stack_t' stack_t uc_stack;

http://stackoverflow.com/questions/20778735/is-the-type-stack-t-no-longer-defined-on-linux

janvorli commented 9 years ago

Ah, then it is the dotnet/coreclr#300 issue

Nassiel commented 9 years ago

Oh my! Thank you now is solved, you are planning to include it by default? Or incompatible with other linux distros?

janvorli commented 9 years ago

As you can see in dotnet/coreclr#300, the proper solution is still being discussed.

jkotas commented 9 years ago

Does it mean that the GC pointers in registers would never be stored on the stack above (if you view the stack as growing upwards) the frame representing the function with the barrier?

Correct. The managed->unmanaged transition has to ensure that there are no unprotected GC pointers in registers - e.g. for PInvokes, the JIT has to spill all registers containing GC references onto the stack.

I do not think that we need any special handling of async safety in the native unwinder. All async safety issues should be taken care of by the thread suspension that does not depend on the native unwinder.

landonf commented 9 years ago

@janvorli The parsing/interpretation API is factored out from the actual unwinding API, so it would be possible to use separately; the only issue would be paying twice for lookup and interpretation.

(I suppose I could also in theory expose a libunwind-compatible interface; either way, I've been wanting to refactor this code so it can be easily used for runtime implementation outside of PLCrashReporter).

@jkotas Please forgive my ignorance :-) -- does that mean the unwinder doesn't need to run while threads are suspended in userspace, including any syscall invocations or other calls into non-managed state (on OS X, even traditional pure 'syscalls' are trampolined through libSystem, and may acquire+drop locks in the process).

jkotas commented 9 years ago

the unwinder doesn't need to run while threads are suspended in userspace

Correct.

kangaroo commented 9 years ago

I think we're done with this now @jkotas ?

jkotas commented 9 years ago

Agree.