dl9rdz / rdz_ttgo_sonde

266 stars 93 forks source link

Humidity value anomaly #152

Closed eben80 closed 2 years ago

eben80 commented 3 years ago

I've been noticing some persistent anomaly in the humidity values that are being recorded in Sondehub for the same frames between auto_rx and rdzsonde receivers. The rdzsonde seem to have a higher humidity value than the auto_rx down to identical frame number level and I was wondering if there is a difference in how the humidity values are decoded in the software.

For the other measurements for the identical frames, the values are identical when rounded but the humidity values have a delta.

I'm not sure which is the correct value since I haven't captured identical frames for two different auto_rx receivers yet.

Screenshot showing filtered decoding of identical frame cases: 2021-09-06 11_23_44-response_1630914238793 xlsx - Excel

Whole excel file: response_1630914238793.xlsx

dl9rdz commented 3 years ago

There is some discussion on this in the issue SondeHub Sonde Support #112 .

When compared to official DWD data, both my software and autorx are sometimes way off...

eben80 commented 3 years ago

Aah yes. Sorry I didn't find it under that heading. Would you prefer I keep this open or will it be enough to deal with it in that more general issue?

mycarda commented 3 years ago

I am still looking at this. I am pretty confident I know how the temperature algorithm works but I am still unsure about the ra-firmware code algorithm on which the rdz_ttgo_sonde code is based.

If we are worried about consistency of the reported values, it would be easy enough to use the same algorithm used by radiosonde_auto_rx (which is the zilog80 code). Neither code seems to be the "truth" compared to the "official" values but maybe we could all be "untruthful" together consistently. What do you think? I am happy to make those code changes, test, and create a pull request if that's what you think is best.

mycarda commented 3 years ago

According to this https://github.com/JorgoChr/Vaisala_RS41_TU_Analysis, Rolf Meeser has written a document about the temperature and humidity calculations for RS41. Anybody know where I can find this document?

LukePrior commented 3 years ago

What do you think?

I think it would be best to get the correct value algorithm integrated to rdzTTGO even if it means a difference in values temporarily then we can work on implementing it in auto rx @rs1729 might be able to help with that.

dl9rdz commented 3 years ago

According to this https://github.com/JorgoChr/Vaisala_RS41_TU_Analysis, Rolf Meeser has written a document about the temperature and humidity calculations for RS41. Anybody know where I can find this document?

Probably just ask him? We could also just try using his code

LukePrior commented 3 years ago

I'm not sure but is this decoder by Rolf Messer correct: https://github.com/einergehtnochrein/ra-firmware/tree/master/src/rs41

eben80 commented 3 years ago

I'm not sure but is this decoder by Rolf Messer correct: https://github.com/einergehtnochrein/ra-firmware/tree/master/src/rs41

That's his profile yes. @einergehtnochrein

rs1729 commented 3 years ago

When @darksidelemm asked me about the differences a month ago, I replied to him that I believe that the function https://github.com/dl9rdz/rdz_ttgo_sonde/blob/4c3a91e3668ac2e0cf8ae3884e984b2682867923/libraries/SondeLib/RS41.cpp#L609 is an older version, in https://github.com/einergehtnochrein/ra-firmware/commit/5debe5233e93445cc85b402d44bc268b5e6978dc#diff-141ab46d3b17f78b6fc0c51df0f813eebe692c38839ac2917483cc05f46ea33c Rolf introduced the correction for low pressure/temperature (in the code there might be a minor bug, as he said, but barely noticeable - he didn't upload yet). The rs1729/RS-RH-corrections are based on this, I talked to Rolf about the RS41 humidity. There are only minor differences, rs1729/RS doesn't use the correction in the Hyland/Wexler function, and the calibrated TH is used (analogous to the air temperature) instead of the uncalibrated T_RH, though this seems to be the case in TTGO sonde as well. The additional correction is noticeable at low temperatures in the stratosphere. (EDIT: Comparing to the DWD data, you can see the time-lag when the relative humidity goes down to 0-1% in the stratosphere.)

Besides that, there will be differences to the published DWD data, because there is certainly some post-processing applied, e.g.
time-lag correction that is noticeable at low temperatures, and solar radiation correction at high altitudes. This I would not do in the decoder.

(Btw, why Hamming Code in RS41.h?)

bazjo commented 3 years ago

According to this https://github.com/JorgoChr/Vaisala_RS41_TU_Analysis, Rolf Meeser has written a document about the temperature and humidity calculations for RS41. Anybody know where I can find this document?

oops, this happens when you dont keep a close eye on the students you supervise...

I have not yet talked to the author about disclosing this document, but i will do now.

The plan was to publish this once it is a bit more polished...

mycarda commented 3 years ago

@rs1729 Thank you. Your comment was very interesting.

When I looked through einergehtnochrein/ra-firmware@5debe52#diff-141ab46d3b17f78b6fc0c51df0f813eebe692c38839ac2917483cc05f46ea33c it seemed to me it uses the results of the pressure sensor in the calculation. The RS41 devices I can receive in the UK don't have pressure sensors. Maybe I could estimate the pressure from the GPS height.

Also, one thing that has never been clear to me is why the humidity sensor temperature has 20 subtracted from it then divided by 180. https://github.com/einergehtnochrein/ra-firmware/blob/e786b011a0cc09dc1f086ce49e4c3f7819315847/src/rs41/rs41metrology.c#L244 Is this some meteorological equation to compensate for something or is it some scaling factor that is required for the calibration values in matrixU?

rs1729 commented 3 years ago

You can use the barometric formula, https://github.com/rs1729/RS/blob/6bc48044627f84b6a8773a4c325f4898c97a2505/demod/mod/rs41mod.c#L705 an approximate value is good enough.

The temperature values are easier to reproduce by just looking at the data and the calibration values. Calculating relative humidity is more involved, the sensors have a more complicated behavior. And every manufacturer has a different way to do the calibration I guess. Sometimes you can find clues in the radiosonde.

EDIT: @mycarda one more thing, if you compare the published data at the same GPS altitude (DWD data shows altitude above MSL), the altitude of the RS41-decoder is above ellipsoid, so don't forget to consider the geoid height.

einergehtnochrein commented 3 years ago

When I looked through einergehtnochrein/ra-firmware@5debe52#diff-141ab46d3b17f78b6fc0c51df0f813eebe692c38839ac2917483cc05f46ea33c it seemed to me it uses the results of the pressure sensor in the calculation. The RS41 devices I can receive in the UK don't have pressure sensors. Maybe I could estimate the pressure from the GPS height.

No problem, as @rs1729 wrote you can just use a simple approximation for the standard atmosphere. At low altitudes the pressure value has no noticeable influence on the result, only at higher altitudes. Even there the influence of the pressure is so small that using a constant pressure of 30 hPa in the calculation still produces reasonable results!

I already had this prepared in a local development branch of the ra-firmware project, but it wasn't yet in the master branch on Github. I have therefore uploaded it to the master branch yesterday. The project now also contains some simple unit tests based on CppUTest. The main motivation was to verify the correct implementation of the PTU calculations! :-)

Also, one thing that has never been clear to me is why the humidity sensor temperature has 20 subtracted from it then divided by 180. https://github.com/einergehtnochrein/ra-firmware/blob/e786b011a0cc09dc1f086ce49e4c3f7819315847/src/rs41/rs41metrology.c#L244 Is this some meteorological equation to compensate for something or is it some scaling factor that is required for the calibration values in matrixU?

I have no clue... Clearly the matrixU coefficients are scaled to match with this transformed temperature value, and I assume there is a meteorological background, but... :-(

If you want to see the accuracy of the calculations, look at this comparison with the data of the German weather bureau DWD. The black lines are the DWD "truth", the colored lines are my calculations. @rs1729 wrote above that the remaining differences are very likely due to post processing done after the sounding ended, and there is hardly any way (and no need) to try and find a better match on a frame by frame basis.

The raw data for this flight is attached along with the 2-second resolution truth file... Raw data uses the log format known from SondeMonitor.

dwd_S0510630.txt S0510630.txt

mycarda commented 3 years ago

@einergehtnochrein Thank you. I think the charts show your calculations are pretty much the same as the DWD data so you are as near to the "truth" as is practical to get.

eben80 commented 2 years ago

Hi all. This morning I also had the instance where the Rh value was reported to SondeHub to be larger than 100. I know that here was a check for negative numbers last time, so perhaps this should be considered too. PS it was reported from my TTGO that I used to recover. https://sondehub.org/#!mt=Mapnik&mz=11&qm=12h&mc=48.93671,21.29013&f=R5020800&q=R5020800

mycarda commented 2 years ago

I have incorporated the @einergehtnochrein code from ra-firmware into my local installation and have been testing for a few days. All looks good so I plan to create a pull request for the development branch. I have also experienced some strange humidity results on occasion (like @eben80 in the previous post) which I believe might be due to either: not zeroing the value that keeps track of what calibration data has been received for a new instance; or calculating the humidity before we have all the required calibration frames. I have corrected both these in my latest code.

The updated code uses the pressure in the humidity calculation but I cannot receive any RS41 with pressure sensors from my location so I have used @einergehtnochrein's code for estimating the pressure. At some point, the code needs adding to calculate the pressure when the RS41 has a pressure sensor. I did not add this code because I won't add code I cannot test. Maybe a to-do for next time I go on vacation to Germany :-) .

I noticed the development branch has changed a lot since I last looked at it, I am missing a few libraries maybe, and having a few problems compiling the code. I will make sure I get it all sorted at my side and do another couple of tests before I create a pull request.

LukePrior commented 2 years ago

I know that here was a check for negative numbers last time, so perhaps this should be considered too.

Hi,

I've just been informed that when the sensor is broken on sondes it is possible for them to report a value above 100. We want to keep this data so won't be adding a check for this particular scenario.

I did take the time to check some other fields for anomalies and found that heading is sometimes reported as greater than 360 degrees so a check has been added for that.

You can see all the checks we currently perform here: https://github.com/projecthorus/sondehub-infra/blob/main/sonde-api-to-iot-core/lambda_function.py#L115

If you have any suggestions for checks we don't currently perform I would love to hear them.

Cheers

eben80 commented 2 years ago

I know that here was a check for negative numbers last time, so perhaps this should be considered too.

Hi,

I've just been informed that when the sensor is broken on sondes it is possible for them to report a value above 100. We want to keep this data so won't be adding a check for this particular scenario.

I did take the time to check some other fields for anomalies and found that heading is sometimes reported as greater than 360 degrees so a check has been added for that.

You can see all the checks we currently perform here: https://github.com/projecthorus/sondehub-infra/blob/main/sonde-api-to-iot-core/lambda_function.py#L115

If you have any suggestions for checks we don't currently perform I would love to hear them.

Cheers

Hi Luke, Thanks for this information. I'm not sure about the broken sensor scenario being the cause though. Check this flight that is currently in progress: https://sondehub.org/#!mt=Mapnik&mz=11&qm=3h&mc=49.12364,20.91763&f=R5020872 It started with invalid Rh data and then it became normal.

LukePrior commented 2 years ago

Thanks for this information. I'm not sure about the broken sensor scenario being the cause though.

Oh yes a broken sensor won't always be the cause but if we were to add a check it would also catch that scenario which we don't want. We will need to look at other options to catch errors like this probably something on the client side.

mycarda commented 2 years ago

It started with invalid Rh data and then it became normal.

That definitely sounds to me like the bug where we are calculating the humidity before we have all the required calibration frames. It would be self correcting when the calibration frame we are missing is received.

mycarda commented 2 years ago

When compiling from the latest development branch, I noticed I had to include the GFX for Arduino library and I also found it would compile with the 1.0.6 version of ESP32 boards but would not compile with the 2.0.0 version.

Anyway, all compiled now and I will re-test on the 6:30 and 10:30 launches from Larkhill tomorrow.

dl9rdz commented 2 years ago

Ah thanks, I have updated the Arduino install instructions accordingly

But when using the version 2.0.0, much less free heap is available, so the software was in some situations running out of RAM, causing a crash/reset. For this reason, and because platformio does not yet support that 2.0.0 version, I currently stick to the 1.0.6 version. 2.0.0: MAIN: Running loop in state 0 [currentDisp:0, lastDisp:0]. free heap: 54039, unused stack: 4968 1.0.6: MAIN: Running loop in state 0 [currentDisp:0, lastDisp:0]. free heap: 112788, unused stack: 4760 If someone has any idea regarding what the cause for this might be, let me know.

mycarda commented 2 years ago

I have no compile problems at all, so it is strange that it would not compile. (there is just one warning on a strncpy that is no problem)

Same for me now once I restarted the Arduino IDE. I know, "have you tried switching it off and on again" :-)

I have updated the Arduino install instructions accordingly

Also in the setup documentation, do you still need the "Additional libraries, part 3" symbolic links now the library files are in RX_FSK/src/

mycarda commented 2 years ago

I have run the updated relative humidity code for two launches from Larkhill this morning with no issues. The new code is just the latest version of @einergehtnochrein code in ra-firmware.

As a quick test yesterday, I added the new @einergehtnochrein code alongside the existing code and the humidity code from radiosonde_auto_rx just to see what the differences were all running together in the same TTGO device. Generally, the new @einergehtnochrein code reports 2-3% lower humidity values than our existing code and radiosonde_auto_rx code reports slightly lower than the new @einergehtnochrein code. The code I will push is just the new @einergehtnochrein code not the other test stuff.

Created pull request #170

darksidelemm commented 2 years ago

What kind of difference are you seeing between the latest code and auto_rx?

eben80 commented 2 years ago

Great, I'm going to see if I can get some comparative values from the lunchtime rs41-sgp flights here. Thank you.

mycarda commented 2 years ago

What kind of difference are you seeing between the latest code and auto_rx?

On the chart, relative humidity is the existing code, new is the updated code and auto is the code from radiosonde_auto_rx. There are two algorithms in radiosonde_auto_rx code, the default one and the advanced one. The advanced one is very much like the updated code so I chose the standard one as the comparison.

Screenshot from 2021-09-17 11-58-10

Here are the data is you are interested. humidity comparison.xlsx

darksidelemm commented 2 years ago

Ah ok, I'm using the 'advanced' one in auto_rx ('--ptu2' in the decoder), so that sounds good.

eben80 commented 2 years ago

My first observation after the test is that I see a lot less of the temp and humidity being reported in SondeHub, even though it is still appearing in the on my modified Data screen on the ttgo web interface.

LukePrior commented 2 years ago

My first observation after the test is that I see a lot less of the temp and humidity being reported in SondeHub, even though it is still appearing in the on my modified Data screen on the ttgo web interface.

We have had to limit the amount of packets we store in SondeHub to keep a control of costs and speed. We only keep about 1/5 of sent packets or every ~5s.

eben80 commented 2 years ago

My first observation after the test is that I see a lot less of the temp and humidity being reported in SondeHub, even though it is still appearing in the on my modified Data screen on the ttgo web interface.

We have had to limit the amount of packets we store in SondeHub to keep a control of costs and speed. We only keep about 1/5 of sent packets or every ~5s.

What I mean is that the frames are there, they just do not contain temp and humidity fields for the telemetry through the latest build.

I think this is something else than the rate-limiting right?

LukePrior commented 2 years ago

What I mean is that the frames are there, they just do not contain temp and humidity fields for the telemetry through the latest build.

Well that must mean that rdzTTGOsonde isn't including these fields when uploading, which is strange if you say you can see them on screen. Do you have any ideas @dl9rdz ?

dl9rdz commented 2 years ago

If temp and rH are available (or, rather, both are not 0), they are always being sent to sondehub.

Its either both values or none. If you locally see a temperature but humidity is 0 (because not enough calibration data has been received yet), no data will be sent. I see no other cases for which you could have data locally, but not included in frames sent to sondehub.

We could change the code such that temperature is included even if humidity is not (yet) available.

LukePrior commented 2 years ago

We could change the code such that temperature is included even if humidity is not (yet) available.

Yes I think this would be a good idea.

LukePrior commented 2 years ago

I managed to get a decent comparison for S4620361 as we had two rdzTTGOsonde stations uploading with the latest version. I have attached the comparison of recorded humidity values below:

image

The difference is now just~0.2% compared to ~3.0% on the previous version.

I believe the remaining difference can be attributed to the fact that auto_rx has not been updated to use the latest changes here: https://github.com/einergehtnochrein/ra-firmware/tree/master/src/rs41 and that auto_rx uses less of the calibration values as per: https://github.com/dl9rdz/rdz_ttgo_sonde/issues/112#issuecomment-897771515

eben80 commented 2 years ago

I also have some data for comparison. Initially it took some time for the temp and humidity readings to come through. When it did it was virtually identical to the the humidity values of an autorx station. It then stopped sending the humidity and temp data and when it came back there was a slight difference as was described above. Is my understanding correct that for the calculation, the receiver needs to get the calibration data from the sonde and use it as part of the calculation? This means that if the receiver reboots for some reason there will be periods where its waiting for calibration data again? I'm trying to explain the periods without temp and humidity data(I know they are part of a condition now. response_1631941116014.xlsx

LukePrior commented 2 years ago

Is my understanding correct that for the calculation, the receiver needs to get the calibration data from the sonde and use it as part of the calculation? This means that if the receiver reboots for some reason there will be periods where its waiting for calibration data again?

I believe that is all correct.

I also noticed that rdzTTGOsonde does not decode subtype, battery voltage, or burst timer which while definitely not necessary would be welcome additions for SondeHub uploading and I couldn't imagine the calculations being too intensive or large. So if anyone feels like giving that a try please do as RS41 makes up about 80% of uploads.

Here is the ElasticSearch comparison (login required)

https://es.v2.sondehub.org/_plugin/kibana/app/dashboards#/view/248c00d0-f05a-11eb-9d66-43598367ce48?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'2021-09-17T22:34:42.843Z',to:'2021-09-18T02:15:24.849Z'))&_a=(description:'',filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'8fec99f0-6222-11eb-b862-2f0a42a5ce55',key:serial.keyword,negate:!f,params:(query:R5020798),type:phrase),query:(match_phrase:(serial.keyword:R5020798)))),fullScreenMode:!f,options:(hidePanelTitles:!f,useMargins:!t),query:(language:kuery,query:''),timeRestore:!f,title:Comparison,viewMode:view)

image

mycarda commented 2 years ago

@eben80

Is my understanding correct that for the calculation, the receiver needs to get the calibration data from the sonde and use it as part of the calculation?

Yes, you are correct. Each status message contains a 16 byte block of subframe information tagged onto the end of it. The whole subframe contains 51 of these 16 byte blocks. Each subframe block has a id number so we know where it fits in the whole subframe. That means the whole subframe slowly builds up 16 bytes at a time as each status message arrives. For temperature we need to have subrame blocks 3 through 20 as these contain the calibration information we need to calculate the temperature. Similarly, for humidity we need blocks 37 through 46 as these blocks contain the calibration information for the humidity temperature sensor and the humidity calibration values. @bazjo has a good description of this with pictures.

It's for this reason that we have to wait until we have enough 16 byte subframe blocks before we can calculate the temperature and humidity. It also explains why we can always calculate the temperature before we can calculate the humidity because we have to wait for fewer subframe blocks for the temperature calibration information but the humidity depends on the temperature so can only be calculated after we have the temperature and all the humidity calibration blocks.

This means that if the receiver reboots for some reason there will be periods where its waiting for calibration data again?

Exactly!

eben80 commented 2 years ago

Here is the ElasticSearch comparison (login required)

Thank you. Do I need some rights to see it after I log in?

@mycarda thank you for that summary, it makes a lot more sense now. I guess it would be good to determine why the auto_rx solution provides such a consistent feed of most measured values, even those requiring calibration data. I suppose reception quality can also play a role. We have signal strength values in both cases.

darksidelemm commented 2 years ago

auto_rx will continue running a decoder for 180 seconds (at least, by default, this can be modified) with no telemetry before that decoder is shut-down and auto_rx re-enters scanning mode. As such, it'll handle fairly long gaps of fading without dropping any of the calibration data.

rs1729 commented 2 years ago

The difference is now just~0.2% compared to ~3.0% on the previous version.

I believe the remaining difference can be attributed to the fact that auto_rx has not been updated to use the latest changes here: https://github.com/einergehtnochrein/ra-firmware/tree/master/src/rs41 and that auto_rx uses less of the calibration values as per: #112 (comment)

auto_rx uses the rs41mod.c-decoder from rs1729/RS. As I mentioned before, the main difference between @einergehtnochrein and the implementation chosen in @rs1729 is that rs1729 uses the calibrated RH-Temperature (Trh) whereas DF9DQ uses the uncalibrated metro.TU in the water vapor saturation pressure correction (Hyland/Wexler) (the "magic" correction does not do much). The difference when choosing TU or Trh in the RH-calculations can be around 0.5C. For low temperatures and high RH this can make a difference of dRH=2% points (maybe 5 percent of the RH value, but if RH is already low, it is maybe 0.1% points.

Although I think in TTGO you also use the calibrated sensorTemp

_RS41_waterVaporSaturationPressure(sensorTemp)

you should see a difference DL9RDZ <-> DF9DQ for low temperature and higher RH values. but the differences TTGO <-> auto_rx should be minor. If it's around 0.2% points as you say, then it is negligible, maybe the minor variations in Hyland/Wexler or rounding errors when evaluating polynomials.

The temperature coefficients that are not used in rs1729 are zero, so there is no difference. If there are differences for temperature of order 1e-5 then probably due to rounding errors of floats, since you can express/calculate the formulas in different ways.

EDIT: I don't know if the "calibrated" Trh is the better choice for Hyland/Wexler, maybe not, maybe there is a hardware difference between the two temperature sensors. I chose Trh because I chose to calculate it analogous to T. And the differences are small, RH is not as accurate as T anyway.

eben80 commented 2 years ago

OK I figured out the reason my temp and humidity reporting was so intermittent is because the TTGO is restarting all the time so it needs to get calibration data for those two measurements every time. I changed the sondehub upload code slightly to get rid of the && condition for temp and rH uploads:

  // Only send temp if provided
  if (((int)s->temperature != 0)) {
    sprintf(w,
            "\"temp\": %.3f,",
            float(s->temperature)
           );
    w += strlen(w);
  }

 // Only send humidity if provided
  if (((int)s->relativeHumidity != 0)) {
    sprintf(w,
            "\"humidity\": %.3f,",
            float(s->relativeHumidity)
           );
    w += strlen(w);
  }

This means that there was more temperature measurements reported but rH still not so much. Even in cases where I saw a humidity value value in the DATA tab. Check frame 6937 2021-09-18 15_43_19-response_1631971174920 xlsx - LibreOffice Calc 2021-09-18 14_49_13-rdzTTGOSonde Server

So I guess the first thing is to figure out why the TTGO is restarting repeatedly while receiving a sonde. Will see if I can monitor serial during lunchtime tomorrow's flight.

Good news is that the humidity values look more consistent with auto-rx now. :)

eben80 commented 2 years ago

I think this is much more consistent with autorx