Debugging LSP issue - Githubissues

mdales commented 5 days ago

I am using Ecode in Haiku, and I set up the OCaml LSP with the following configuration:

  "servers": [
    {
      "command": "/boot/home/dev/ocaml-lsp/_build/default/ocaml-lsp-server/bin/main.exe",
      "file_patterns": [
        "%.ml$",
        "%.mli$"
      ],
      "language": "ocaml",
      "name": "ocaml"
    }
  ]
}

And whilst I can see in the output of ps that the lsp server is running, it general has never worked, and so I assumed I'd just misconfigured it, and filed it as something I'd look into one day.

But today, for one brief moment, it worked:

screenshot4

But I don't know why, and it's not worked again since! I thought perhaps it was because I was running ecode from the command line (so as to pick up my custom built version with #82 added), but that didn't seem to change things. I thought perhaps it was because I ran it from the project directory, but that doesn't seem to do it.

I guess the issue is that I can't see how to debug the situation I'm in: how do I diagnose what is clearly some odd configuration issue. It could well be a fault on the LSP server side, but is there any way to get logs that'll tell me that? I had a look in the EEPP code but couldn't see such a thing.

SpartanJ commented 5 days ago

Hi!

I guess the issue is that I can't see how to debug the situation I'm in: how do I diagnose what is clearly some odd configuration issue.

I don't think so, I had exactly the same problems with clangd in Haiku. This is an specific Haiku problem. I've found how to make it work on my side: disabling the Git plugin. Which makes NO SENSE. But I noticed that clangd was supper buggy on Haiku and did not continue investigating. There seems to be something odd on Haiku when trying to write into the process stdin. I had some weird behaviour with Haiku in the past that I had to patch, so I guess this is another case of that, problem must be in the Process class (eepp/src/eepp/system/process.cpp) since this is the only point in common with the Git plugin, both plugins run and communicate with external processes. If you're familiar with Haiku maybe you have some ideas.

It could well be a fault on the LSP server side, but is there any way to get logs that'll tell me that? I had a look in the EEPP code but couldn't see such a thing.

The easiest way is to make it more verbose, you can run ecode with: ecode -ldebug -v, this will print all LSP protocol communication in your terminal, also you can see the logs in the internal console (opens with F12 key), also, the logs are saved in the settings folder, I think in haiku is $HOME/config/settings/ecode.

I'm currently traveling and only have arm64 hw available, so I cannot investigate further until I come back home. But feel free to take a look if you have the time, thanks.

SpartanJ commented 5 days ago

Since you already are building ecode maybe try to ensure SUBPROCESS_USE_POSIX_SPAWN is not defined in process.cpp, maybe the problem is with the posix spawn usage, if this is disabled it will use fork instead.

mdales commented 5 days ago

Okay, I made some progress, which documenting here as unfortunately other commitments forced me to stop today.

I can confirm, it's when you have a git directory that things break, just adding and removing the .git directory I can repro what I'm seeing. Yesterday I started a new project, which explains why it worked!

So I did some digging. The root cause is here:

https://github.com/SpartanJ/eepp/blob/2e1bcfe49e3d6adcd6761e312784234813f82c95/src/eepp/system/process.cpp#L417-L418

    while ( anyOpen && !mShuttingDown && errno != EINTR ) {
            int res = poll( pollfds.data(), static_cast<nfds_t>( pollfds.size() ), 100 );

With other processes running, poll returns EINTR, which reading up I think is expected behaviour - I assume the git processes being ran and exiting trigger signals that cause poll to exit. This loop in the async reader then exits because of the explicit check that EINTR didn't happen. I don't know why that check is there, but I assume there's a reason :)

In theory on should use ppoll with a signal mask to stop it being triggered by other child processes exiting, but I couldn't seem to find the correct signal mask for ppoll to stop it exiting EINTR. That's where I got to when I ran out of time.

I'll try and spend some more time on this when I can!

SpartanJ commented 4 days ago

This loop in the async reader then exits because of the explicit check that EINTR didn't happen. I don't know why that check is there, but I assume there's a reason :)

No real reason, this class was based/inspired on tiny-process-library although I used my own implementation, the async read is a carbon copy from theirs implementation, take a look here. I basically trusted this implementation given that it's being used in juCi++, an open-source IDE. I'm not very knowledgable of posix so I tried to trust someone else implementation. But as for the EINTR checks they make sense, since this interrupt is basically a non-error and execution should continue, for example this is triggered when gdb interrupts the process execution it will trigger EINTR in all poll/select/read/write/sleep/etc system calls.

With other processes running, poll returns EINTR, which reading up I think is expected behaviour - I assume the git processes being ran and exiting trigger signals that cause poll to exit. This loop in the async reader then exits because of the explicit check that EINTR didn't happen. I don't know why that check is there, but I assume there's a reason :)

I've been reading and I don't think this should be the case for any modern operating system. My theory would be that Haiku does no implement POSIX.1c and errno is not a thread-local variable, but I'm very much in doubt because POSIX support in Haiku is really great. Maybe we can request help from some of its core developers.

Take a look at this documentation extract:

In POSIX.1, errno is defined as an external global variable. But this definition is unacceptable in a multithreaded environment, because its use can result in nondeterministic results. The problem is that two or more threads can encounter errors, all causing the same errno to be set. Under these circumstances, a thread might end up checking errno after it has already been updated by another thread.

To circumvent the resulting nondeterminism, POSIX.1c redefines errno as a service that can access the per-thread error number as follows (ISO/IEC 9945:1-1996, §2.4):

Some functions may provide the error number in a variable accessed through the symbol errno. The symbol errno is defined by including the header , as specified by the C Standard ... For each thread of a process, the value of errno shall not be affected by function calls or assignments to errno by other threads.

Otherwise errno state of the git process should not affect the LSP process.

If this is the case we could just remove all the errno logic from the Haiku implementation, since it's not critical, something like this for Haiku:

while ( anyOpen && !mShuttingDown ) {
    int res = poll( pollfds.data(), static_cast<nfds_t>( pollfds.size() ), 100 );
    if ( res > 0 ) {
        anyOpen = false;
        for ( size_t i = 0; i < pollfds.size(); ++i ) {
            if ( pollfds[i].fd >= 0 ) {
                if ( pollfds[i].revents & POLLIN ) {
                    const ssize_t n = read( pollfds[i].fd, &buffer[0], mBufferSize );
                    if ( n > 0 ) {
                        if ( n < static_cast<long>( mBufferSize - 1 ) )
                            buffer[n] = '\0';
                        if ( fdIsStdOut[i] )
                            mReadStdOutFn( buffer.c_str(), static_cast<size_t>( n ) );
                        else
                            mReadStdErrFn( buffer.c_str(), static_cast<size_t>( n ) );
                    } else if ( n <= 0 ) {
                        pollfds[i].fd = -1;
                        continue;
                    }
                }
                if ( pollfds[i].revents & ( POLLERR | POLLHUP | POLLNVAL ) ) {
                    pollfds[i].fd = -1;
                    continue;
                }
                anyOpen = true;
            }
        }
    } else if ( res < 0 ) {
        break;
    }
}

If you can test it let me know if works.

mdales commented 4 days ago

Just a quick one, and an apology as I should have been more clear in my original post!

What I see is that errno is zero and poll is returning 0 or 1, then poll returns -1 and errno is set to EINTR, and that seems to coincide with a git process being run. As such, the proposed change there will fail in the same way. As such, I don't think errno in this instance is being obviously corrupted by another thread - we do see poll fail at the same time errno is set.

My current work around (and clearly a hack, which is why I've not made a PR) is I just check if the poll rails specifically with EINTR, and if so I then clear it. This then works again, and I can see that every time we have a git process kick off, it does the same little dance.

I don't disagree that there's likely some Haiku specific oddity here though - it's just which one it is :) EINTR is system call being interrupted by signal - I don't know what std::thread maps on to in haiku, but my gut tells me that we have a sys call on that thread being interrupted by signals related to the git process from the main thread. But that's just a gut threory right now, I have no evidence to back it up.

mdales commented 2 days ago

I did a little spelunking into the Haiku source, and I think in Haiku at least, EINTR just means we were waiting for FD events, but our thread got interrupted, so we didn't complete waiting.

The posix interface implementation of poll and ppoll are here:

https://github.com/haiku/haiku/blob/e4efb705e4047926b682d330b6f8883bdae6bf8c/src/system/libroot/posix/poll.cpp#L31

The call _kern_poll:

https://github.com/haiku/haiku/blob/master/src/system/kernel/events/wait_for_objects.cpp#L924

Which in turn calls common_poll:

https://github.com/haiku/haiku/blob/master/src/system/kernel/events/wait_for_objects.cpp#L569

This reads the list of FDs, sets the signal mask and then does the waiting by calling acquire_sem_etc:

https://github.com/haiku/haiku/blob/master/src/system/kernel/events/wait_for_objects.cpp#L616

It's worth noting here that it is called with the B_CAN_INTERRUPT flag, which we will come to later.

acquire_sem_etc is just a wrapper for switch_sem_etc:

https://github.com/haiku/haiku/blob/master/src/system/kernel/sem.cpp#L727

In the middle of that function, before it does the actual waiting AFIACT, it does a check to see if there have been any signals:

https://github.com/haiku/haiku/blob/master/src/system/kernel/sem.cpp#L794

This is done by calling thread_is_interrupted:

https://github.com/haiku/haiku/blob/master/headers/private/kernel/thread.h#L195

Here if there are any pending signals, and if B_CAN_INTERRUPT is set, we start the return flow back to the original, with the return code of B_INTERRUPT, which is mapped to EINTR.

I've no idea what the outstanding signal is - but I just wanted to capture the spelunking before I forget.

Again, I don't understand why the thread is being interrupted by the git process, when I'd assume events for that would go to the main thread that spawned them, but perhaps Haiku has different semantics. At some point I'll try dropping into the Haiku IRC and see if I can get some answers.

SpartanJ commented 2 days ago

Sorry, IDK if I'm too tired but I just read again the current implementation and it looks wrong to me, I think I didn't follow you correctly until now. Based on what we have been talking the loop is incorrectly implemented, the intention was to NOT leave the loop if errno was EINTR and the check is inverted, or I'm too sleepy or this is actually stupid and never read with attention, loop should do:

        while ( anyOpen && !mShuttingDown ) {
            int res = poll( pollfds.data(), static_cast<nfds_t>( pollfds.size() ), 100 );
            if ( res > 0 ) {
                anyOpen = false;
                for ( size_t i = 0; i < pollfds.size(); ++i ) {
                    if ( pollfds[i].fd >= 0 ) {
                        if ( pollfds[i].revents & POLLIN ) {
                            const ssize_t n = read( pollfds[i].fd, &buffer[0], mBufferSize );
                            if ( n > 0 ) {
                                if ( n < static_cast<long>( mBufferSize - 1 ) )
                                    buffer[n] = '\0';
                                if ( fdIsStdOut[i] )
                                    mReadStdOutFn( buffer.c_str(), static_cast<size_t>( n ) );
                                else
                                    mReadStdErrFn( buffer.c_str(), static_cast<size_t>( n ) );
                            } else if ( n == 0 || ( n < 0 && errno != EINTR && errno != EAGAIN &&
                                                    errno != EWOULDBLOCK ) ) {
                                pollfds[i].fd = -1;
                                continue;
                            }
                        }
                        if ( pollfds[i].revents & ( POLLERR | POLLHUP | POLLNVAL ) ) {
                            pollfds[i].fd = -1;
                            continue;
                        }
                        anyOpen = true;
                    }
                }
            } else if ( res < 0 && errno != EINTR )
                break;
        }

poll res > 1, read the data, poll res = 0 is a timeout, res < 0 is an error, if it's not EINTR we should leave the loop since it's an actual error. The original implementation is doing that, I just changed it by error.

mdales commented 2 days ago

Cool - that does make sense to me now. I've checked out the latest develop and started running that, but I can confirm that it does work for me with OCaml LSP on a project with git. Many thanks!

SpartanJ commented 1 day ago

I'm glad it worked! And sorry for the confusion!

SpartanJ / ecode

Debugging LSP issue #360