Closed apeyton closed 7 years ago
Hey, sorry for the slow response @apeyton - tough week. Thanks. Yes, I think we'll have a list of potential problems to look at, these and others. At this point, I'd say let's finish up the 100 days and then get to work examining these types of issues; I will beef up the R pipeline (see #11 ) in preparation for this process.
About your no. 1 above - please don't change the Picarro time zone! Ugh. Much easier to deal with this in other ways.
So, re 1 and 2 above - purely cosmetic in that one diagnostic plot. We can fix it, but also fine to ignore. Re 4 - see my comment in issue #13 Re 3 - I'm unclear. Here's your plot (below), but I don't see any problem in any of the diagnostics for 27 October. What if anything should be done here? I'm concerned if cores are potentially mis-labeled of course.
I think the issue isn't shown in this individual plot - I was comparing this plot with the replication plot. There were not 6 replications for the cores 23, 18, 15, 21, 9, 37. So, we need to figure out the script that mislabeled the ports/time/core or remove the wonky from the data all together. Makes any sense?
Ah, got it. OK, let me look at those. Hmm:
Core Date n
1 AL 15 2015-10-27 6
2 AL 18 2015-10-27 6
3 AL 21 2015-10-27 6
4 AL 23 2015-10-27 6
5 AL 37 2015-10-27 6
6 AL 9 2015-10-27 6
7 Ambient22 2015-10-27 6
Looking just at AL 15
from that date:
samplenum DATETIME N MPVPosition h2o_reported valvemaprow min_CO2
1 1075 2015-10-27 18:06:39 115 12 2.5263508 552 499.5770
2 1081 2015-10-27 18:18:36 120 12 2.5524665 552 502.8194
3 1101 2015-10-27 20:27:27 86 12 0.8356557 552 460.7885
4 1107 2015-10-27 20:39:28 81 12 0.8054356 552 464.4229
5 1127 2015-10-27 21:55:17 85 12 0.8209873 552 457.9037
6 1133 2015-10-27 22:07:15 83 12 0.8240117 552 456.8838
Then when I look at those sample numbers in the Picarro data, they're all from valve 12 which is in fact AL 15 on that date. They appear in five different files (I have opened up the raw files and verified this):
[1] "CFADS2283-20151027-171013Z-DataLog_User.dat.gz"
[2] "CFADS2283-20151027-181018Z-DataLog_User.dat.gz"
[3] "CFADS2283-20151027-200015Z-DataLog_User.dat.gz"
[4] "CFADS2283-20151027-210022Z-DataLog_User.dat.gz"
[5] "CFADS2283-20151027-220030Z-DataLog_User.dat.gz"
It does look like there were six different samples taken from AL15 across five hours. Thoughts @apeyton ?
Comment from @apeyton on Slack--until further notice ball is in her court on this one.
from those cores...or from those ports? Raw Picarro data only takes the port/valve into consideration. There were def. 6 different samples taken, but not from those cores...from those ports/valves. I think I need to recheck the core weights/times sheet with the raw Picarro data to make sure we accurately aligned our cores with the ports and raw data.
I checked the times recorded for 27Oct2015 and I think I figured out why it assumed that there were 6 replications for the cores 23, 18, 15, 21, 9, 37 based on the times I recorded: It just assumes that everything past the time recorded is assigned to the core, but that is not correct. Perhaps we need to be in an 'end time' in the script. Basically, only assign valves/ports to cores based on only the 30 minutes following the time recorded. This would ensure that we are only collecting the measurements that we select (that are accurate!) and not the repeated mess up measurements made.
Thoughts?
typo - "add in an 'end time'"
Hi @apeyton I'd like to tie this up.
Yes, the script assumes that any measurement made at or later than "Time_set_start_UTC" (in the valvemap.csv
file), made on the same day, and with a matching valve number should be assigned to that particular core. Relevant code is in 2-summarize.R:101-103
:
rowmatches <- which(DATETIME >= valvemap$StartDateTime &
yday(DATETIME) == yday(valvemap$StartDateTime) &
MPVPosition == valvemap$MPVPosition)
but that is not correct
So what should be done? You suggest that the match should be made only for 30 minutes after the start time. OK, but then what should be done with measurements outside this window?
Thanks, B
I think the measurements outside this window should be tossed.
Why? I don't understand. What's special about 30 minutes?
Update on issue: 30Mar2016 phone call 1) Cores for each treatment were measured in duplicate totaling 24 mins (per treatment) 2) If Picarro is recording core data past 30 mins from time of first measurement then there is an error that we need to identify 3) Updating CoreData.csv to list picarro data to remove from analysis (i.e. removing runs where there were picarro reading errors) to help solve many of the problems identified in this issue (list from Nov 15th). 4) Measurements made after 30 minutes may correspond to valve 10 - the ambient valve that measures continually between treatment measurements.
There are a few issues showing up on our missing/problematic data (especially in regards to replication) that I am concerned with. The later leads me to wonder if some of the data we agreed should be removed is included in the analysis. Here is a list of my concerns: