JnyJny / busylight

Control USB connected presence lights from multiple vendors via the command-line or web API.
Apache License 2.0
227 stars 25 forks source link

[BUG] Packet length incorrect for BusyLights #216

Closed tinwhisker closed 1 year ago

tinwhisker commented 1 year ago

Software Versions:

General Type of Problem

Describe the Problem Incorrect packet length at device.

Expected Behavior Correct packet length.

Error Output Non-display.

Hi, I found what seems to be a bug which presents differently on both Windows and Mac.

Both: The packet sent to the HID device is shifted left by one byte (i.e. the leading 0x10 is gone).

Mac: This results in a 63 byte packet received by the Busylight. The Busylight seems to handle this. (Fills its buffer backwards?) Debug in your app correctly shows whole packet, data:64 = b'' wrote 64 [bytes].

Windows: The packet received is 64 bytes, but the packet sent is 65 bytes(!). The Busylight does not handle this. Debug in your app states: data:64 = b'' wrote 65.

(Note: I don't know how faithful the logging is, but often the data is sprinkled with spurious characters - random '\r' '?' or bytes being longer 'x002')

I've not toyed with the Python HID libs to know where best to start tinkering.

tinwhisker commented 1 year ago

Plot thickens. Wireshark packets screenshot. The highlight sections show the difference between the SDK and BLfH. The packets sent are 64 bytes, still off-by-one on the BLfH packet though.

frameIssue

JnyJny commented 1 year ago

Thank you very much for the bug report. I'm traveling right now, but I will pick this up as soon as I get back in front of a keyboard.

tinwhisker commented 1 year ago

I'd have assumed as much as this time of year 👍. I'm throwing what I can when possible and triple checking I've not made mistakes.

JnyJny commented 1 year ago

Can you confirm for me what versions of python you are using on MacOS and Windows?

tinwhisker commented 1 year ago

Windows 3.9.13 (x64) macOS 3.10.6 (ARM64)

Both using busylight 0.24.2 lib via pip

JnyJny commented 1 year ago

That's the version of Busylight for Humans :) I'm looking for the version of python: python --version so I can attempt to recreate the problem.

tinwhisker commented 1 year ago

Nope, the versions are 100% Python versions. If you're not seeing them, i can screenshot and circle them 🤣

JnyJny commented 1 year ago

So now I see the disconnect, I was presuming the MacOS and Windows values were operating system versions and I admit to wondering briefly how such an old version of MacOS would run on an ARM based Mac. Chalk it up to still being on island time.

tinwhisker commented 1 year ago

Aha, i was wondering if GitHub was parsing/filtering numbers and stripping them. (Windows 10 and 11 tried now, and macOS 13.2, if that clears those up )

I'll try to get usb traces on macOS, but it requires fiddling with the security.

JnyJny commented 1 year ago

Unfortunately, I don't have a Windows machine to test on. I added some extra debugging to the Kuando Busylight write strategy functions but haven't been able to see a short (63-byte) write to the device yet. I also don't have an ARM-based Mac to to test on, further complicating things.

The leading "0x10" is the busylight JUMP opcode and isn't always present (usually it is 0x8? which is the KEEPALIVE opcode). An opcode of 0x00 is invalid, so I agree that the buffer appears shifted in the Wireshark screenshot. ~That's likely the JUMP target.~ ~Nope, that is a zero pad.~ Nope, it's the repeat field. Two weeks away from a computer and it's like starting over.

If this was a case of incorrect word-endianness handling, I would expect the Mac ARM packet to be more scrambled. I haven't inspected the checksum in the Mac packet, but it looks like it's in the ballpark.

Looking at it more, the final bytes of the Mac ARM buffer are definitely shifted left. The checksum looks good and has the three 0xff bytes preceding it, but there is a trailing null byte which is not part of the packet. My guess right now is there is an off-by-one error somewhere in the code that ARM is tickling.

I computed the checksum assuming a leading 0x10 byte and it matches the checksum for the ARM buffer:

>>> s = 0x10
>>> s += 0x5c
>>> s += 0x29
>>> s += 0x08
>>> s += 0xff
>>> s += 0xff
>>> s += 0xff
>>> hex(s)
'0x39a'

So the packet is "good" but shifted at some point.

The method busylight.lights.light.Light.update is the entry point to all writes to devices:

    def update(self) -> None:
        """Write the software state to the device.                                                                                                                          

        Raises:                                                                                                                                                             
        - LightUnavailable                                                                                                                                                  
        """

        data = bytes(self)

        with self.exclusive_access():
            try:
                nbytes = self.write_strategy(data)
                logger.info(f"data:{len(data)} = {data!r} wrote {nbytes}")
            except Exception as error:
                logger.error(f"write_strategy raised {error} for {data!r}")
                raise LightUnavailable(f"{self} {self.path}") from None

In the case of the Busylight Alpha and Omega lights, the call to bytes(self) will generate the checksum and return a bytes object constructed from the software state of the light. The write_strategy method in this case is an alias for hid.device.write where the actual I/O occurs. I am starting to hope that the HIDAPI library is the source of the off-by-one, but it's more likely I made a mistake somewhere in all the bit-twiddling.

tinwhisker commented 1 year ago

Fyi, the image above is from Wireshark on Windows x64. But the packet layout seen is the same received by my tool. When I get back to the office, I will try on an Intel Mac.

Quick context for discovering the bug: I recently needed to test that our product uses various lamp SDK's properly, so built a HID tool to fake a few. (Enumerates as a device, parses the data and returns via a CDC channel).

For speed/laziness, I found your tool quicker to spit Busylight/Blynclight data at me!

Initially I used the real devices with your tool and found the Busylight didn't respond on Windows, but did on Mac, so thus down the rabbit hole I began. My tool was showing shifted data which I initially assumed was a bug my side. But using Wireshark I could see the same packet misalignment, hence the bug report.

I guess the main thing really is - if you only support Mac and it works on Mac, then it's a non-issue. In which case, I'll try find time to learn some more python.

As for 2 weeks away is like starting again - I completely understand! I reversed the Busylight in 2013 to write a driver in Obj-C and it's always alien if I have to revisit USB/HID code.

JnyJny commented 1 year ago

I have not had any luck reproducing this error. I'll be able to work on an M1 Mac later this week so hopeful I can reproduce the problem and start working on a fix for it.

JnyJny commented 1 year ago

I have not been able to reproduce this bug, please re-open if you have more information or have further ideas about what might be happening here. Sorry for not being able to provide a good answer.