PowerBroker2 / pySerialTransfer

Python package to transfer data in a fast, reliable, and packetized form
MIT License
143 stars 35 forks source link

Reading is slow #36

Closed kyleb-imp closed 3 years ago

kyleb-imp commented 3 years ago

Hey there, I firstly want to say that this library (and its Arduino companion) is awesome and makes communicating with Arduinos so much easier.

I've been running into some issues reading data from multiple Arduinos at the same time. If I just read from one Arduino, it only takes 1-2ms for transfer.available() to complete (measured using time.perf_counter). But if I connect 2 devices that are both sending data to the computer at 10hz, the readings sometimes only take 1-2ms, but sometimes they take 20-70ms. Usually it starts at 1-2ms for a few seconds, then it starts taking 20-70ms for the rest of the time it is running.

The 2 transfer objects are on separate threads, so I imagine this is the root of the issue somehow... I am sending data on a separate thread, but I tried adding locks and that didn't fix the problem.

If I never send any data and just receive, then it always runs at 1-2ms. If I send data for a few seconds and then stop sending data, then it runs at 20-70ms, even well after I stopped sending data.

Each transfer receives in its own thread in this method:

    def run_serial_receiver(self, remote: transfer.SerialTransfer, lock):
        while True:
            time.sleep(.001)
            with lock:
                t0 = time.perf_counter()
                if remote.available():
                    dt = (time.perf_counter() - t0)
                    print(dt)
                    if(dt>.01):
                        print(f"EXCESSIVE TIME: {dt}")
                    sliced_buff = remote.rxBuff[0:remote.bytesRead]
                    rxbytes = bytes(sliced_buff)
                    self.process_RemoteMessage(rxbytes)

And data is sent using this method on the main thread:

    def send_output(self, output):

        packet_size = len(output)
        for i, remote in enumerate(self.remotes):
            with self.serial_locks[i]:
                remote.txBuff = output
                remote.send(packet_size)
        if self.send_callback:
            self.send_callback()

I'm running on Windows 10, with the SerialTransfer library running on a Teensy 4.0.

kyleb-imp commented 3 years ago

It seems like the reads are taking a long time even with just 1 Arduino connected. It doesn't always get into this state. As mentioned above, often it will only take 1-2ms per read. But when it gets into this state, every read takes 10-70ms.

PowerBroker2 commented 3 years ago

How much data are you transmitting in bytes per packet, how fast are you reading the data, and what is the usb COM port input buffer size?

kyleb-imp commented 3 years ago

Thanks for the follow up, I appreciate it.

The packets being sent from Arduino to computer are 52 bytes. The data is being sent from the Arduino at 20hz. So a total data rate of 8320 bits per second, which is not very much. The Teensy always transmits at 12Mbps. I've also tried sending data at 10hz which seems to get into this state less often.

As shown in the run_serial_receiver method above, the python code polls for new data at up to 1khz.

I've tried adjusting the buffer size to the minimums also (didn't fix it) but this is the default: image

kyleb-imp commented 3 years ago

I'm using Arduino and Teensy interchangeably btw.

kyleb-imp commented 3 years ago

I also just tried this running the same Arduino code on an STM32G4, and it has the same issue.

PowerBroker2 commented 3 years ago

This is rather strange...I'm not exactly sure what could be wrong. Perhaps add some timestamped debug statements in available() and see if all the bytes are being properly read at the right times? Also, be sure to run in debug mode to see if you're popping any transfer errors (dropped packets can increase that perceived latency)

Lastly, are you sure you're using the latest lib versions?

kyleb-imp commented 3 years ago

Ok I didn't realize there was a debug mode. I am not using tick(), so just now I added the status checks into my code. Sidenote, I think your logic in tick() might not be setup correctly. It checks for "if not self.status", which means that a status of -1 or -2 will never get printed. Only a status of 0 will get printed. https://github.com/PowerBroker2/pySerialTransfer/blob/84ff2048de8099605f9cd660f34fbfcb2b798c20/pySerialTransfer/pySerialTransfer.py#L603

So I implemented it this way (where remote is the SerialTransfer instance):

elif remote.debug and remote.status <= 0:
      if remote.status == 0:
          err_str = 'CRC_ERROR'
      elif remote.status == -1:
          err_str = 'PAYLOAD_ERROR'
      elif remote.status == -2:
          err_str = 'STOP_BYTE_ERROR'
      else:
          err_str = str(remote.status)

      print('ERROR: {}'.format(err_str))

Anyways, I am not getting any CRC, Payload, or Stop Byte errors. So that's not the issue.

I am using the latest version of the library from pip: 2.1.2.

Any recommendations for where specifically to put the debug timestamps in available()? I did try that a bit, but I'm pretty lost trying to figure out what it's doing. I can dig into it more.

PowerBroker2 commented 3 years ago

Good catch! What about the manual debug statements I suggested?

kyleb-imp commented 3 years ago

Oh sorry that got lost in my last message. Did you have any recommendations for how to go about debugging available()? I tried debugging with timestamps, but I really didn't know exactly what to look for.

PowerBroker2 commented 3 years ago

Basically timestamping anytime the code checks for bytes available to be read from the USB buffer, how many bytes there are, and call out when each byte is being read/parsed

kyleb-imp commented 3 years ago

Ok I dug deeper into this this afternoon and put some effort into understanding how the available() method works. I think I found the issue and fixed it.

Most of time seems to be taken in the connection.read() method, which is part of the PySerial library. It can take as long as about 1ms, even though it's only reading 1 byte at time. I imagine there's some overhead in the library or in Windows that slows those calls down. So for my 52 byte payload, plus the extra bytes before and after, it can get up to 60-70ms total. With a longer payload, it would take even longer.

I fixed this by changing the behavior in the self.state == find_payload section. Instead of looping through to get 1 byte at a time, I just grab the whole payload all at once using connection.read. I left the rest of the code alone, so it still grabs one byte at a time for all the other bytes. I guess there's benefits to grabbing one byte at a time so that you can recover if the frame gets out of sync. Not really sure about best practices there. But for the payload it should be fine.

Here's the updated portion of the available() method. I can do a pull request if you want, or you can just fix it.

elif self.state == find_payload:
    self.rxBuff[0:self.bytesToRec] = [recChar] + list(self.connection.read(self.bytesToRec - 1))
    self.state = find_crc

With this fix, my calls to available() are now taking between 0.3ms and 5ms. Still a bit longer than it should be, but much better. Good enough for my application. Plus, now it should be mostly insensitive to payload length.

PowerBroker2 commented 3 years ago

Good find, I was unaware of this issue with PySerial. I have a different idea on how to solve this - should be solved soon

PowerBroker2 commented 3 years ago

Ok, try the master branch and see if it fixes things

kyleb-imp commented 3 years ago

I took a look at your fix and I think it introduces some new issues that are pretty serious.

The main issue is that if there are multiple messages in the serial buffer, this will only return the oldest message, and throw away the rest. When you do self.connection.read(self.connection.in_waiting) it completely empties the buffer, but then the byte in bytes loop only looks at the first message in the buffer and then returns. This is going to break a lot of people's code I imagine.

I think it also is less able to recover if the CRC byte or COBS byte randomly end up being the same as the start byte (1 in 256 chance of this happening), since you're emptying the entire buffer and ignoring the rest of the bytes in the buffer as soon as an error is encountered.

Let me know if I'm thinking about this wrong.

PowerBroker2 commented 3 years ago

Fair points - if you want to do a PR with your approach, that'd be cool

PowerBroker2 commented 3 years ago

Fixed in 2.1.3

umaplehurst commented 3 years ago

@kyleb-imp This change unfortunately creates a bit of a problem, which is, a call to link.tick() can now block in the OS whereas before it should never block (because, if I understand things right, the library would previously only ask for bytes from pySerial if pySerial actually had bytes available to give out). This means that existing code that has a while True loop of some sort which is doing other stuff in addition to .tick() in the loop now might be blocked by the OS from being able to do its other stuff (e.g. consider the situation if there is an error condition on the link and a truncated packet is sent out). However, I wholly agree that asking the OS for as many bytes as we expect that we should receive is the best approach, this also burns far less CPU cycles overall. So, I think adding a timeout to pySerial's configuration might be a way forward here. Then, link.tick() should not block forever if there is insufficient data received on the link due to an error condition. Could you please try out this patch? pyserialtransfer-timeout-patch.txt

kyleb-imp commented 3 years ago

I think you are correct that tick() will now block. If you set the timeout without changing any other code, that should fix the problem, right? I think that will either give you a status of CRC_ERROR or CONTINUE, returning immediately after the read times out.

I think it makes sense for users to set the timeout themselves, based on what their specific application can tolerate. So if what I said above is true, then we don't need to change any of the code, but maybe adding to the readme and examples would be good, specifying that the user should set a timeout themselves.

These are the PySerial timeout options available:

timeout = None: wait forever / until requested number of bytes are received timeout = 0: non-blocking mode, return immediately in any case, returning zero or more, up to the requested number of bytes timeout = x: set timeout to x seconds (float allowed) returns immediately when the requested number of bytes are available, otherwise wait until the timeout expires and return all bytes that were received until then.

I don't think that non-blocking mode would work, since you could easily miss most of a packet that was part way through being transmitted.

@PowerBroker2 thoughts?

umaplehurst commented 3 years ago

@kyleb-imp Thanks for your thoughts. I didn't explain this in my previous comment so should now mention it! What I found during some testing is that there is still a bit of a problem with the new code, because the previous code was doing byte-by-byte indexing on self.rxBuff which was always pre-allocated to a length of MAX_PACKET_SIZE and would always stay at this length. With the new code, I think this line here:

https://github.com/PowerBroker2/pySerialTransfer/blob/4cd4e772447d4e14c62a0bcddbc5199bcae15e9d/pySerialTransfer/pySerialTransfer.py#L536

... will now end up truncating the self.rxBuff to a length smaller than self.bytesToRec in the situation where there is an incomplete read from the OS due to a timeout e.g. if the library user sets a timeout on PySerial. Then, this line was giving me an index out of range exception as crc.calculate expects there to be at least self.bytesToRec in the self.rxBuff:

https://github.com/PowerBroker2/pySerialTransfer/blob/4cd4e772447d4e14c62a0bcddbc5199bcae15e9d/pySerialTransfer/pySerialTransfer.py#L540

... hence the patch which changes things around to maintain self.rxBuff at the pre-allocated length like it used to be.

PowerBroker2 commented 3 years ago

Pardon me if I misunderstand all the working parts of this thread, but I can't quite see where the issue of blocking is coming from. tick() calls available() as a conditional in an if statement, so if available() is non-blocking, so should tick() be. Inside of available() I see:

https://github.com/PowerBroker2/pySerialTransfer/blob/4cd4e772447d4e14c62a0bcddbc5199bcae15e9d/pySerialTransfer/pySerialTransfer.py#L507-L510

I see we have a while loop that processes all bytes available. Assuming we can process the bytes faster than we receive them, this loop shouldn't really block, however, I can imagine certain edge cases where it might.

To prevent this potential blocking, how about taking a "snapshot" of how many bytes are available when first calling available(), and then reading only this many bytes, regardless of how many arrive after the snapshot until calling available() again? Something like this:

            if self.connection.in_waiting:
                for i in range(self.connection.in_waiting):
                    recChar = int.from_bytes(self.connection.read(),
                                             byteorder='big')

Oh wait, now I see the line yall are concerned about (line 536). Yes, this can easily cause a blocking issue. @umaplehurst can you submit a PR? It would help me figure out better on how I want to fix things easier than looking at a text doc. Thanks!

umaplehurst commented 3 years ago

@PowerBroker2 Sure thing, I'll push my patch through as a PR shortly.

PowerBroker2 commented 3 years ago

Fixed in 2.1.4 - should be live on PyPI soon