ccwienk / temper

Simple python for accessing TEMPer USB thermometers
MIT License
35 stars 27 forks source link

"stuck" temperature reading #13

Open corsac-s opened 2 years ago

corsac-s commented 2 years ago

Hi, I noticed I sometime get the same temperature reading for multiple hours using TEMPER2 external sensor:

pcsensordual_TEMPer2-day

I'm unsure if it's something in the firmware or in the way it's queried with Python. It looks a bit like that “old” issue in a different project: https://github.com/padelt/temper-python/issues/61

Not sure there's anything which can be done at the software level but in case some other people experience that and/or have a workaround, I'd be interested.

eode commented 1 year ago

I've sent a message to the manufacturer, but am doubtful of getting a response.

corsac-s commented 1 year ago

I've sent a message to the manufacturer, but am doubtful of getting a response.

Thanks! I'm not holding my breath indeed.

Considering the following quote from #9 applies to this issue:

I haven't run into this issue -- how frequent is it? Do you have a repeatable case we can use for testing?

Honestly I don't know. I run temper.py on various boxes during munin runs, in order to graph temperature over time. So I don't really have a repeatable case besides just running it every 5 minutes and check if the graphs look correct or not.

Over the last week there really was an example of the external sensor beeing stuck around 11°C here: image

In the mean time, I've created the branch stuck_temp_fix that you can check out.

Please let me know if it helps. If it does, I'll optimize it and include it (at least as a configurable option). It will probably be disabled by default, because it does incur a +50% overhead in run time for each device -- or, if optimized, would still incur a lot of irregularity in the call time.

Yes I'll report back. I don't think I'm too bothered by the overhead but I can understand it's an issue.

eode commented 1 year ago

@corsac-s Thanks, it getting stuck like that is pretty weird, and ~has to be~ is probably a hardware issue (considering it shows up in other projects as well). Looks like it's happening often enough that it should be confirmable in a week or two.

That branch automatically uses the reset, so you don't need any particular cli or function arguments.

Oh, BTW, just confirming relative to the PR and README -- the temper-hum 3.9 doesn't support temperature, just humidity, right?

corsac-s commented 1 year ago

@corsac-s Thanks, it getting stuck like that is pretty weird, and ~has to be~ is probably a hardware issue (considering it shows up in other projects as well). Looks like it's happening often enough that it should be confirmable in a week or two.

Yes, I'll report back

Oh, BTW, just confirming relative to the PR and README -- the temper-hum 3.9 doesn't support temperature, just humidity, right?

I don't know, I only have Temper2 and TemperGold so no humidity sensor.

eode commented 1 year ago

Ah. Was adding another one, I think it's my typo then.

corsac-s commented 1 year ago

@corsac-s Thanks, it getting stuck like that is pretty weird, and ~has to be~ is probably a hardware issue (considering it shows up in other projects as well). Looks like it's happening often enough that it should be confirmable in a week or two.

Yes, I'll report back

I'll let it run a bit more but unfortunately it doesn't seem fixed: image

eode commented 1 year ago

I'd have to agree - that means a USB reset doesn't have an effect on the issue. I'd be happy to see a few more days of data to satisfy the completionist in me, but realistically that's probably enough data to call that potential resolution a no-go.

That's too bad, and it makes me concerned about my own device - I'm going to need to start graphing that data for my own system to check for this issue.

On Fri, Nov 25, 2022, 9:32 AM Yves-Alexis Perez @.***> wrote:

@corsac-s https://github.com/corsac-s Thanks, it getting stuck like that is pretty weird, and has to be is probably a hardware issue (considering it shows up in other projects as well). Looks like it's happening often enough that it should be confirmable in a week or two.

Yes, I'll report back

I'll let it run a bit more but unfortunately it doesn't seem fixed: [image: image] https://user-images.githubusercontent.com/18496906/204016805-a0351923-fa51-4b01-a705-9cc6a9b0692c.png

— Reply to this email directly, view it on GitHub https://github.com/ccwienk/temper/issues/13#issuecomment-1327632002, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS4PT77H3QNJXSDD6YIHELWKDL2JANCNFSM6AAAAAAQYOVPR4 . You are receiving this because you commented.Message ID: @.***>

eode commented 1 year ago

Still, since it's consistently the same incorrect temperature, we might be able to at least increase the accuracy by making it really work hard if it wants to return that value.

This would be a significant lag spike, so it would need to be optional, but if we get that specific value, we could poll it N times and see if it only returns that value or if it occasionally returns another one. Then, if it occasionally returns another one, use that value instead.

Of course, this depends on the behavior of the device, and I don't really know if it would vary, or if it's solidly out of commission during those times.

corsac-s commented 1 year ago

So after a few days: image

If you have ideas on how to debug or poke the device and you can't really reproduce on yours don't hesitate to ask, I can definitely run experimental code here.

eode commented 1 year ago

Still on my radar, but I haven't had time recently to work on it.

I haven't heard back from PCSensor.

The only thing I can think of that might work (and I doubt it) is we can poll it multiple times if the result we get is 11c, and see if it gets anything else. But by the graphs, I'm not too hopeful about that.

corsac-s commented 1 year ago

The only thing I can think of that might work (and I doubt it) is we can poll it multiple times if the result we get is 11c, and see if it gets anything else. But by the graphs, I'm not too hopeful about that.

Yeah I don't think it'll work, it get stucks at multiple temperatures: 11C but also around 19C (not the same device):

image

I'm not sure what makes the thing “unstuck”.

eusoubrasileiro commented 3 weeks ago

I have the same issue @eode and @corsac-s with a temper 2 v4.1 I have been using it for 1 year already I got used to "fix" it using a rolling window with pandas.

I have a sqlite database with thousand of hours of readings with these kinds of errors. Just updating: made this new picture with 150k readings I have here on my database. The issue is with both sensors in and out. df[['temp_in', 'temp_out']].plot(figsize=(8,5), marker='.', markersize=0.8, alpha=0.6, linestyle='none', ylim=[22, 30])

image

Look those lines of prefered readings for both sensors in (internal) and out (external). Exactly like what happened here on this old issue. That's crazy....

Also, here is an example of last night from 21 to 7 am this morning. temp_in and temp_out are from Temper. temp_zb is from a zigbee termomether.

image

Chatgpt 4.0 proposed to be a eletric or manufacturing defect that makes it 'fall back' to close to a default value...

This site here points some issues on low-powered systems not sure if related? I use mine on an orangepi5.

Any updates from the manufacturer?

Taomyn commented 3 weeks ago

Having similar issues with my sensors. Had an old one TEMPer1V1.2 that I thought had gone defective when it got stuck at the same temp, I think it was 23.5c unless the temp went up, it was an standard AMD PC, so I tried switching to another system, a RPI-4 and it did the same at the same temp. So I ordered another, this time TEMPer2_V4.1 with an external sensor, and it does the same for the external and is stuck at 22.56c. The strange thing is I have another sensor identical to the TEMPer1V1.2 and I now notice it also gets stuck, but for shorter periods at 23.44c

image

eusoubrasileiro commented 3 weeks ago

I am suspecting it's related to this part of the code and this issue I have change the timeout values to 1 second and 2 seconds for temperature readings. In fact this will only delay the readings between the firmware and the temperature but I am suspecting that is related to the response time of the sensor that's bad... I've also increased my readings to every 2 minutes... I’m just taking shots in the dark... didnt work

eode commented 3 weeks ago

This issue is one I've just about given up on, and I'm probably going to move to using a different sensor than pcSensor. But, I'm still not absolutely certain it's a hardware issue.

Thanks for exploring the issue @eusoubrasileiro . Your graphs are really enlightening.

What we see here looks like binary truncation. That is, the number is rounded down to a lower value, because some amount of the data is lopped off. You can see this in the graphs by both the solid bands and the clear spaces above the bands. The problem is that it doesn't seem to be consistently truncated, which smacks of a read error. But, it occurs in situations where both the internal and the external temperature are present, which strongly smacks not of being a read error.

I'm starting to suspect that the second byte, which does fractional celcius degrees, is not read correctly. Either this was an error in the original C code, or it was introduced in this library when it was translated out of the original C. But another possibility is that the fractional degree byte isn't populated correctly by the pcSensor hardware.

eusoubrasileiro commented 3 weeks ago

Thanks a lot for your answer @eode and for the honor to receive it. And congratulations for the awesome project! I will reread your answer many many times I am sure...

I still think like you too. The windows software seems to work (altough I never used it).

On this approach, yesterday I downloaded this old C code (https://github.com/shakemid/pcsensor-temper) Unfortunantly, even after I modified it to include my TEMPER_TYPE { 0x3553, 0xa001, "TEMPer2_V4.1", 1, 2, 0, decode_answer_fm75 }, // TEMPer2* eg. TEMPer2V4.1 and manage to compile and run it I cant read the temperature...

I got errors related to USB reading Couldn't find the USB device, Exiting: 0 maybe the device descriptor is wrong? maybe the hid protocol? I understand almost nothing of this... even with ChatGPT4 I didn't manage yet to understand this subject that seems quite complex using libusb.

Well if someone has time maybe that old piece of C, or even better your original code (you translated), could help us on this.

I'll certainly explore your idea of binary truncation using some LLM like ChatGP4 or Claude. Thanks a thousand!

eusoubrasileiro commented 3 weeks ago

For the 150 k data I have I filtered for the more dense region 22 to 30 degrees (I'm from Brazil tropical country).

dfiltered = df[(df['temp_out'] < 30) & (df['temp_out'] > 22)]
unique = dfiltered.temp_out.unique()
unique.sort()
np.diff(unique)
array([0.06, 0.06, 0.07, 0.06, 0.06, 0.06, 0.07, 0.06, 0.06, 0.06, 0.07,
       0.06, 0.06, 0.06, 0.07, 0.06, 0.06, 0.06, 0.07, 0.06, 0.06, 0.06,
       0.07, 0.06, 0.06, 0.06, 0.07, 0.06, 0.06, 0.06, 0.07, 0.06, 0.06,
       0.06, 0.07, 0.06, 0.06, 0.06, 0.07, 0.06, 0.06, 0.06, 0.07, 0.06,
       0.06, 0.06, 0.07, 0.06, 0.06, 0.06, 0.07, 0.06, 0.06, 0.06, 0.07,
       0.06, 0.06, 0.06, 0.07, 0.06, 0.06, 0.06, 0.07, 0.06, 0.06, 0.06,
       0.07, 0.06, 0.06, 0.06, 0.07, 0.06, 0.06, 0.06, 0.07, 0.06, 0.06,
       0.06, 0.07, 0.06, 0.06, 0.06, 0.07, 0.06, 0.06, 0.06, 0.07, 0.06,
       0.06, 0.06, 0.07, 0.06, 0.06, 0.06, 0.07, 0.06, 0.06, 0.06, 0.07,
       0.06, 0.06, 0.06, 0.07, 0.06, 0.06, 0.06, 0.07, 0.06, 0.06, 0.06,
       0.07, 0.06, 0.06, 0.06, 0.07, 0.06, 0.06, 0.06, 0.19, 0.06, 0.07,
       0.06, 0.06, 0.06])

The min is 29.93 and min is 22.06 if we calculate the step with 125+3(added 3 because of 0.19) unique counts we get ~0.0620. My sensor is Temper2_v4.1 from internet I found it is FM75. From there it says 0.0625 is one the supported resolutions and matches exacly the unique values from 22.06 to 29.93 for the samples above.

The exercise above explain the fixed interval between samples above ...

But the mistery of the prefered or "stuck" values is still open....