bugst / go-serial

A cross-platform serial library for go-lang.
BSD 3-Clause "New" or "Revised" License
619 stars 189 forks source link

Malformed data being sent on Windows #60

Closed martinzak-zaber closed 1 year ago

martinzak-zaber commented 5 years ago

Hello,

We have been using this library for some time and we have encountered an issue which is very difficult to reproduce and debug. The problem is caused by the Read function in the windows implementation:

func (port *windowsPort) Read(p []byte) (int, error) {
    var readed uint32
    params := &dcb{}
    ev, err := createOverlappedEvent()
    if err != nil {
        return 0, err
    }
    defer syscall.CloseHandle(ev.HEvent)
    for {
        err := syscall.ReadFile(port.handle, p, &readed, ev)
        switch err {
        case nil:
            // operation completed successfully
        case syscall.ERROR_IO_PENDING:
            // wait for overlapped I/O to complete
            if err := getOverlappedResult(port.handle, ev, &readed, true); err != nil {
                return int(readed), err
            }
        default:
            // error happened
            return int(readed), err
        }

        if readed > 0 {
            return int(readed), nil
        }
        if err := resetEvent(ev.HEvent); err != nil {
            return 0, err
        }

        // At the moment it seems that the only reliable way to check if
        // a serial port is alive in Windows is to check if the SetCommState
        // function fails.

        getCommState(port.handle, params)
        if err := setCommState(port.handle, params); err != nil {
            port.Close()
            return 0, err
        }
    }
}

If there is no traffic on the port this function calls setCommState every 1 second. If there is also an ongoing Write operation at the same time, the transmitted data gets malformed and the other side receives gibberish. The race condition requires very precise timing to be reproduced. We are very confident that the issue is not caused by our code.

As this library is used by your customers I thought this may be quite critical information for you. We have spent around two days hunting this down. Feel free to contact me for more details.

Thank you for implementing this awesome library.

cmaglie commented 4 years ago

Hi @martinzak-zaber, thanks for the bug report, and sorry for the long delay.

May you provide an example program to help to reproduce the issue? Writing in a tight loop may and reading from another goroutine may be sufficient to trigger the issue?

I guess you already tried to remove the getCommState/setCommState piece and this did solve the issue for you?

martinzak-zaber commented 4 years ago

Hello @cmaglie,

I, unfortunately, don't have the testing program anymore. It was also a part of a bigger project. But I do recall that we were running an intense stress test. Basically writing and reading from the port at the same time many times. It was slightly unusual because our protocol is request/response so most of the time there is no overlap between writing and reading. We had to be bashing it with multiple requests at the same time.

In our fork, We have removed the get/set commState entirely. I don't recall what it does to detecting whether the port is alive. But one use case we test for is ripping out the FTDI cable and for that, we get an error. So it works for us without the call.

Feel free to check out our merge requests: https://github.com/martinzak-zaber/go-serial/pulls These are all the fixes and changes we have made for our fork.

Thank you again for creating the library!

palazzol commented 1 year ago

Hi @cmaglie,

FWIW - the link cited in this comment seems to be gone, but the fixes described seem to be here - if you are looking for them: https://github.com/zabertech/go-serial

Hello @cmaglie,

I, unfortunately, don't have the testing program anymore. It was also a part of a bigger project. But I do recall that we were running an intense stress test. Basically writing and reading from the port at the same time many times. It was slightly unusual because our protocol is request/response so most of the time there is no overlap between writing and reading. We had to be bashing it with multiple requests at the same time.

In our fork, We have removed the get/set commState entirely. I don't recall what it does to detecting whether the port is alive. But one use case we test for is ripping out the FTDI cable and for that, we get an error. So it works for us without the call.

Feel free to check out our merge requests: https://github.com/martinzak-zaber/go-serial/pulls These are all the fixes and changes we have made for our fork.

Thank you again for creating the library!

martinzak-zaber commented 1 year ago

I am sorry, we have moved the library under company GitHub. https://github.com/zabertech/go-serial We have made dozen or so fixes to the library over the years (see commits). This library is an important dependency of our software.

Thank you again to the authors of the library.

cmaglie commented 1 year ago

Thank you everyone for the feedback and the fix, I've finally managed to reproduce the bug and confirm the solution. 👍🏼