COMPASS-DOE / data-workflows

Sensor data workflows and processing scripts
MIT License
4 stars 0 forks source link

Document possibility of duplicate observations at a timestamp #189

Closed bpbond closed 2 months ago

bpbond commented 2 months ago

Because 1min and 5min and 15min are included, data density may vary, and there could be (rarely) multiple observations at a single timepoint that arose from different tables running at slightly different times.

stephpenn1 commented 2 months ago

If I understand this (Roy's) point here, could we end up with, for example, two teros measurements at "2024-06-10 12:43:00" but one is a 15 minute average and the other is a 5 minute sample?

I will say that according to my campbell code sample, we only average sapflow and that get its own name (diffvolt_avg vs diffvolt), so I don't think we would run into this issue unless we start averaging a variable without renaming it

bpbond commented 2 months ago

Here are two raw files that we're now processing:

PNNL_41_Terosdata_5min_20220811000013.dat
PNNL_41_Terosdata_20220811000013.dat

These files have overlapping timestamps, i.e. they both have observed values at 00, 15, 30, and 45 minutes within each hour...but these values will be different, because one is the 5-minute average, and the other is the 15-minute average. So they'll likely differ by a small amount.

bpbond commented 2 months ago

Interesting, maybe this isn't true! I pulled two lines from two different files and they're identical:

From PNNL_33_Terosdata_20240701000139.dat

"2024-06-30 18:15:00",99883,"PNNL_33",2377.72,22.7,413,2382.27,22.5,389,2401.54,21.7,777,2373.07,22.3,433,2530.2,22.3,911,2435.54,22.7,683,2375.01,22.2,676,2412.58,22,809,2428.54,22,633,"NAN","NAN","NAN","NAN","NAN","NAN",2351.57,22.9,834,2316.69,22.4,610,"NAN","NAN","NAN","NAN","NAN","NAN","NAN","NAN","NAN",2468.2,23.1,718,2342.96,22,611,2341.23,20.9,682,2597.35,23.7,1107,2540.53,22.4,628,2343.53,20.9,174

From PNNL_33_Terosdata_5min_20240701000141.dat

"2024-06-30 18:15:00",97988,"PNNL_33",2377.72,22.7,413,2382.27,22.5,389,2401.54,21.7,777,2373.07,22.3,433,2530.2,22.3,911,2435.54,22.7,683,2375.01,22.2,676,2412.58,22,809,2428.54,22,633,"NAN","NAN","NAN","NAN","NAN","NAN",2351.57,22.9,834,2316.69,22.4,610,"NAN","NAN","NAN","NAN","NAN","NAN","NAN","NAN","NAN",2468.2,23.1,718,2342.96,22,611,2341.23,20.9,682,2597.35,23.7,1107,2540.53,22.4,628,2343.53,20.9,174

stephpenn1 commented 2 months ago

That makes sense to me, my understanding is that the teros only gets sampled and not averaged. So these would indeed be identical, @roylrich can you confirm?

bpbond commented 2 months ago

Ah, got it, thanks