cedrus-opensource / pyxid

Pure python library for communicating with Cedrus XID devices
BSD 3-Clause "New" or "Revised" License
21 stars 19 forks source link

Problem with `query_base_timer` #7

Open larsoner opened 9 years ago

larsoner commented 9 years ago

A simple modification of the demo script leads to an error:

import pyxid

# get a list of all attached XID devices
devices = pyxid.get_xid_devices()
dev = devices[0]  # get the first device to use
if dev.is_response_device():
    dev.reset_base_timer()
    dev.reset_rt_timer()

    while True:
        dev.query_base_timer()
        dev.poll_for_response()
        if dev.response_queue_size() > 0:
            response = dev.get_next_response()
            print(response)

Pressing some buttons results in:

{'pressed': True, 'port': 3, 'key': 0, 'time': 2160}
{'pressed': False, 'port': 0, 'key': 0, 'time': 2270}
{'pressed': True, 'port': 0, 'key': 1, 'time': 2405}
{'pressed': True, 'port': 0, 'key': 0, 'time': 2480}
Pyxid found unparseable bytes in the buffer. Flushing buffer.
{'pressed': False, 'port': 0, 'key': 0, 'time': 2534}
{'pressed': True, 'port': 3, 'key': 0, 'time': 2555}
Pyxid found unparseable bytes in the buffer. Flushing buffer.
Pyxid found unparseable bytes in the buffer. Flushing buffer.

In other words, the query_base_timer() is causing some sort of problem, which can then break. I think there is some hint with the 'port' being incorrect, since it shouldn't change ports at all. Without the query_base_timer call, there is no error, and the port is always zero.

This is on Linux with a 740 box, if it matters. I haven't tested to see if it also happens on other platforms, but I suspect it will. If it does, it means that we cannot really use the box -- our software needs to periodically check the timer for drift, and doing so seems to prevent proper button press recording currently. FWIW it looks like the errant message has key "e" instead of "k" (from the (k, params, time) = unpack line in xid_input_found from internal.py).

kkheller commented 9 years ago

Thank you. This is a very clear description. The part about the "e" and the "3" might be particularly relevant. That could be a hint that the timer response is sitting in the same queue where the responses sit. Timer replies contain "e3" in them.

We (Cedrus) will test this at some point over the next few days. I do expect that this will be reproducible on our end.

After testing, I expect that we can fix this, either by patching pyxid or by suggesting some restructuring/reordering of function calls in the script you are writing (i.e. the modified demo script).

I will get back to you within a few days.

Meanwhile, since you clearly have "the chops" for debugging python, you might benefit from knowing what the actual expected bytes emanating from the RB device are supposed to look like.

http://cedrus.com/xid/protocols.htm http://cedrus.com/xid/timing.htm http://cedrus.com/xid/a_summary.htm http://cedrus.com/xid/miscellaneous.htm

larsoner commented 9 years ago

Great, I probably won't have time to dig into the issue in the next few days (maybe in a couple of weeks), so let me know if you make some progress.

I actually have a branch going to make this package usable in Py3k that I can open shortly. Good to know this is actively monitored :)

Do you have any information regarding the changes made to the x40 series related to the timer drift issue mentioned in the README? I'd be happy to update the README.txt as part of my PR. Our custom presentation software currently relies on each device having a reliable clock, so I'm hoping that the x40 series does indeed have the drift issue resolved.

larsoner commented 9 years ago

Also note that the minimal example/change above is not an exact replica of the set of calls we make, so reordering of calls might not be an option in our current structure. Fixing the underlying code so that calls to reset_base_timer, reset_rt_timer, query_base_timer, poll_for_response, plus button presses can be done in arbitrary orders would be the best solution. (get_next_response is also used here, but I noticed it's a simple list .pop operation so shouldn't really matter here.)

larsoner commented 9 years ago

Ahh, looks like it might happen when there is an event in the queue waiting to be read... sending the e3 command works, but the response message is behind a keypress event. Adding dev.con.read_nonblocking(65536) before and after the dev.query_base_timer() helps alleviate the problem, but doesn't eliminate it (e.g., if a button is pressed in the <1 ms interval between when the queue is cleared and when the query_base_timer executes). Changing query_base_timer to consume all bytes and use only the last 6 would work, but it would drop button presses.

I'm not sure if there is an easy solution in the current framework. It is set up to allow some functions (BaseDevice.query_base_timer -> XidConnection.send_xid_command) to send commands and read the n_bytes of the current response assuming the serial buffer only consists of the response from the command (i.e., not from a recent button press), while others (XidConnection.check_for_keypress) assume that the only items in the buffer will be keypresses. It might be worth restructuring internal.py to make it so that all read commands are appended to a single buffer in XidConnection, and then consumed as necessary by functions that need responses. For example, send_xid_command would read all relevant bytes in (not just the next 6), and know just to return the last 6 in the list. It could thus append all but the last 6 bytes to the global buffer. The button-press function would then be restructured to 1) read in all pending bytes, and 2) traverse the list global bytes list, converting everything it finds into presses (since that's all that should exist). My 2c, anyway.

kkheller commented 9 years ago

I can reproduce this (which is not surprising by now to anyone reading the ongoing analysis). I also confirmed that this same structural problem (assumptions about which inquiry owns which bytes in the buffer) also exists in our c++ project: https://github.com/cedrus-opensource/xid_device_library . The c++ project mainly exists to implement the Response Device Extension used in the NBS Presentation software. In that context, these function calls are never made in this order.

I agree that restructuring the code so that the various function calls can be done in arbitrary orders is ideal.

This has been logged as an issue in our internal Cedrus bugtracker.

larsoner commented 9 years ago

Thanks for the info @kkheller, looking forward to eventual progress.

This isn't a show-stopping bug for us, because there is a workaround. However, this one is: https://github.com/cedrus-opensource/pyxid/issues/2#issuecomment-91077270. I could measure the drift many times to establish if it's consistent and, if it is, compensate for it, but that is an ugly solution so I'd rather not do it if there is a better fix or if I'm being dense.