Hard to interpret fiber work when looking at list of threads

BrodyHiggerson commented 3 years ago

Since you previously mentioned workflow and usability wrt fibers, I figured I'd make this post, although I appreciate it's likely a bit more ambitious.

There are a few issues I see in this area.

The way I use FTL is to have every bit of work be performed by fibers - to the point that my 'main' thread of execution is just another fiber pinned to a specific worker. I have lot of fibers and use them for everything.

When I look at this view, it doesn't really tell me what's going on, and I can't gleam much from it wrt what work is executing when/where. Highlighting a fiber span does highlight the corresponding function stack in the latter parts of the window, but this is difficult to see.

In FTL, the default number of fibers is 400, which, as you can imagine, blows up the size of the "Fibers" area quite a bit (and possibly hits an error message wrt thread limits from memory - can't check right now), and makes the aforementioned highlighting not really useful - which of the 400 fibers is being highlighted? No idea!

Ideally, there would be a thread-first view of the worker threads that shows the work being performed on them as normal; i.e. how you would see work displayed on threads without using fibers - I think it matters less which fiber # is active when, and instead more which callstack is active when on which thread. So in this example on the left wouldn't be fibers but instead the worker threads executing any and all fibers, showing the work those threads are performing, with Palanteer doing the magic of tying fiber-based work to the threads that execute it.

Hopefully that makes sense. Again, totally appreciate it's a bunch of work. Just wanted to provide my perspective; happy to test any ideas in this area, incremental changes, prototypes, etc.

dfeneyrou commented 3 years ago

For typical fibers, that makes sense.

On the other side, my mental model was a Discrete Event Simulator, with virtual time, where OS threads are "emulated" thanks to the longjmp or ucontext API inside one main OS thread. Hence the display per "virtual thread", which is the right one for this use case.

I need more thinking how to handle these two different use cases of the "virtual threads"...

BrodyHiggerson commented 2 years ago

I've been using Palanteer a bunch lately, got me thinking about this topic.

One "conceptually simple" feature for displaying fibers that I think would really, really help is the ability to look at the fibers "squashed" down to the threads that executed them. That is to say; not showing the spans where the work is halted due to swapping out the fiber, so that if in some hypothetical situation where there was 0 time spent waiting, every squashed thread would just be full of spans/events for the markers belonging to those fibers with no breaks in-between. Then if someone wants to see exactly where a span was waiting/suspended, they can look at the existing view.

Just a thought! Been enjoying the tool.

dfeneyrou / palanteer

Hard to interpret fiber work when looking at list of threads #27