Potential typo causing data download failure

micahjohnson150 commented 4 years ago

radicl version: 0.4.0
Firmware version (if applicable): 1.45.3.2
Python version: 3.6
Operating System: linux

Description

I was having issues getting the calibrated data to download. When we resolved the accelerometer issues, it exposed some fixes I had made prior that then created problems.

So I need you to look at the following lines of code from probe .py :

https://github.com/AdventureData/radicl/blob/a35d71b92b6a010ea68b8fdc92d3724e9b90da04/radicl/probe.py#L188-L189

https://github.com/AdventureData/radicl/blob/a35d71b92b6a010ea68b8fdc92d3724e9b90da04/radicl/probe.py#L208-L209

I believe somehow this combo is preventing the data from the calibrated sensor to download.

Before I recieved your fixes for the accelerometer, I added the None checking logic on line 208. Yesterday I went out and I was unable to get data from the probe until I made an adjustment to line 188 to refer to ret3 not ret2 and I removed my None checking on 208. Could you confirm that

Line 188 is supposed to refer to ret3 not ret2?
And where I should/shouldn't be None checking on line 208

When I made the changes above I was able to get data but I also get a lot of errors.

arothenbuhler commented 4 years ago

Correct, line 188 should refer to ret3 only. This check is there to make sure that 1) the data read returned without an error (check for status == 1) and 2) that there is a data payload (data != None). Line 208, however, is where we catch all failed retries and I'm not sure why the if statement is even there. When we get to line 208 we should just return and not check for things. I guess the idea was to make sure that the status was indeed not set to 1 OR that there was indeed no data payload.

You mentioned that you were able to get the data, but with lots of errors. Does that mean you were successful in reading the full data buffer, but there were lots of retries/read errors? My thinking is that this is a timing issue where the probe hasn't responded yet (probably because it's a lot slower than a PC and is still processing the request or fetching the data). I have been working on an improved serial read method, but it will only help with true serial transmission, not with USB.

What kind of data were you trying to read?

micahjohnson150 commented 4 years ago

This was with calibratedsensor data. I definitely get lots of misses and errors.

micahjohnson150 commented 4 years ago

@arothenbuhler I have another question for you. Same area.

Looks like for the data segments you wanted to index with a non-zero based index. In the first data request (ret2) you revert back to a zero based index but then for ret3 you use the non-zero based index. Is this a typo or some EE magic?

https://github.com/AdventureData/radicl/blob/a35d71b92b6a010ea68b8fdc92d3724e9b90da04/radicl/probe.py#L168

https://github.com/AdventureData/radicl/blob/a35d71b92b6a010ea68b8fdc92d3724e9b90da04/radicl/probe.py#L169

https://github.com/AdventureData/radicl/blob/a35d71b92b6a010ea68b8fdc92d3724e9b90da04/radicl/probe.py#L186

arothenbuhler commented 4 years ago

Yes it is EE magic. Let me introduce you to the Flux Capacitor... Haha, no this does look like a bug! Nice catch. Not sure how this went unnoticed for so long. Why non-zero based index? No particular reason. I wrote this a while ago when I was a Python noob (well, maybe I still am...) and I found it confusing how array indexing worked differently from C. We can either change line 186 to ' ii + 1', but I think it would be most intuitive if line 169 is just 'ii' and 168 is change to go from 0 to num_segments. Good catch, this needs to be fixed, but I don't think this will fix the communication issues we're seeing....

arothenbuhler commented 4 years ago

Also, can you be more specific on the the "misses and errors"? Is the download failing completely (i.e. you don't get the fill data set), or does the download complete but you get errors and retries? These are two completely different things...

Since these transmission errors are only happening when retrieving measurement data (not when doing some simple probe interaction), I think the issue is that the Python code is not waiting long enough for the response to come in. Then it will send the request again, which will just queue another read request. So you could easily get out of sync... In the MeasReadDataSegment function of the API on line 517 you could try increasing the read timeout. By default it is set to 50ms. This should technically be high enough, but we are doing a lot of post-processing when returning data, so it could take a bit longer. Below is the line in question. Change it to the following:

response = self.__send_receive(message, 0.2)

This sets it to 200ms, which should be plenty. If you see an improvement, but still some dropouts, increase it further, but at some point we're just accepting things taking too long and would have to then figure out why it's taking the probe too long. We might be very marginal right now, which is why it works for the most part, but get lots of dropouts.

micahjohnson150 commented 4 years ago

Most of the time the download is successful. But I see a ton of retries. I also see quite a few probe state errors.

Ah interesting. I have messed with some of the time frames before but I don't think I have with this one.

AdventureData / radicl

Potential typo causing data download failure #9

Description