Closed bpbond closed 1 year ago
I think we want either mmol/m2/day or umol/m2/day, I am not picky on that right now. Usually they are pretty high, so I think mmol is the easiest for viewing as of now.
I can't think of any other diagnostics right now, but Pat did have some feedback on this, so maybe we can loop him in soon? He had some ideas about the QAQC portion and how we were going to assess the quality of the fluxes. He was saying that the R2 will likely always be high because we have so many data points, so we may need to think about this a bit.
Thanks.
OK, I have added a 'flux' computation (but without worrying about units or area/volume corrections for now) and some basic QA/QC. See https://rpubs.com/bpbond/989830. If you go down to the "QA/QC" section you'll see (1) a searchable and sortable table of results, which may be useful, and (2) some basic graphs.
so many data points
Yeah, agreed. Again, without disrespecting the SERC standard, I don't get the five minutes...
Re Pat, of course, feel free to loop him in whenever!
This looks awesome! Thank you so much.
A few thoughts came to my mind when I was looking through the code:
Pat and I discussed was whether it would be possible to have a clickable graph where, if we needed to adjust the start and end time, you could make that selection with your curser in R and reselect the data you want to use? This is likely a reach goal and I don't even know if it's possible, but I will throw it out there and I can also do some googling.
We were also thinking it would be good to have one of our QC's be the starting concentration. In theory this should be close to ambient unless someone put the chamber lid on and then waited a bit before selecting a start time. This looks like it could be a little hard because most of the fluxes seem to be starting after the concentration has increased as the CH4 concentrations are pretty high.... we might need to think about this a bit.
Lastly, Pat did mention that there has been some discussion in the literature about fitting both linear and curvilinear fits and then using what fits best to the data? I haven't worked with curvilinear functions much, but maybe it is something we could try - again with the high R2's and so many data points I am not exactly sure how we will assess which fits better, but something to consider.
Not expecting all of these to be fixed soon - just didn't want to lose the thoughts! Thank you so much again! -Steph
clickable graph
A live graph with zoom/point selection capabilities is easy, for example to explore and look at different slices of the data. To make those selections feed back to change the analysis, though, would require us to change this to a Shiny web app. Definitely doable, but definitely more work.
starting concentration
Agreed, I like the idea but not quite sure how this would work.
both linear and curvilinear fits
This is straightforward to do, and yes is often standard protocol—e.g. it's what the Licor smart chambers do.
Graph: Okay maybe we make the Shiny app a reach goal - it would be very cool to do that, but could take me some time to learn how to build it or more time for someone who is more familiar.
Starting concentration: Ya, I am thinking that this might not work considering how we do the fluxes, the concentration starts rising pretty quickly. I guess we could go back in the data and look for right when the lid was placed, but it would be complicated.
Fits: Oh great! I guess I didn't really think about that too much, but yes the Smart Chamber does give you the option - sweet :)
I'll add curvilinear and robust regression options.
I've added a visual curvilinear, but actually fitting the model will be slightly more complicated, so opening an issue for that.
I added a comparison of standard versus robust flux estimates, which may be informative when outliers are distorting things.
Thanks for working on this team! It will be a time saver that pays off in the long run. A couple of comments: (i) I probably misspoke but my comment about regression statistics with so many data points is that the p-value is uninformative because it will always be significant; the R2 is helpful because it will decline as the flux approaches the detection limit of the instrument or there is a problem, and (ii) the starting value could perhaps be accessed by evaluating the y-intercept of the regression. This is mainly helpful for CH4 to detect when placing the lid caused ebullition.
Thanks for the thoughts @megonigalp .
Steph, what are your thoughts for next steps? Would you like me to try running entire 2022 data through, or implement some of the metrics discussed above?
By metrics above do you mean the starting value and such? Or?
I think we should give the whole dataset a go - it can't hurt. Maybe we should set up a meeting in the next couple weeks to discuss in person the three of us?
@wilsonsj100 In the 2022 output data you sent me, the date is always in YYYY-MM-DD format except for one file (COMPASS_August2022_Licor_Data.txt
) that is in MM-DD-YYYY.
This is...weird. Do you know why? Should the code be prepared to handle either format?
Hmmm that is weird... I will look into it. My first thought is that maybe a different licor was used and it has it's output set up to be MMDDYYYY and the other machine is the other way. We typically use Gru, but I think once we used Louise.... I will look at my notes. Is it possible to be set up for both? Would be nice to have all the licors set up to output data the same way
Doesn't appear to be a different Licor... Maybe it is something I did when I opened the file and saved it to my computer?
Sorry for all the issues! Let me take some time tomorrow to double check these things - I haven't gone through these yet, so there may be some going back to the original field doc's to check! Thank you so much :)
Excel is notorious for f***ing with dates, so if you opened it with that, yeah that could do it 😄
The code can auto-check but if it's a one-time thing may not be worth it? Anyway, again, we can discuss.
That is probably what happened :/ I will check back to the file sent to my email from the machine download and see if it also has that date issue or not.
Looks like excel was the culprit. I must have combined the three files for august in excel to make it easier for me, but that changed the date. Here are the raw August files attached to this comment. We should use these - let me know if you want me to resend the whole "all data" folder. Thanks! TG10-01087-2022-08-15T070000.data.txt TG10-01087-2022-08-18T070000.data.txt TG10-01087-2022-08-31T010000.data.txt
OK, thank you @wilsonsj100 for the data detective work! Here are all the data processed and visualized:
https://rpubs.com/bpbond/989830
Feedback welcome in all areas :)
☝️ these are not yet unit-converted or corrected for volume
Yay! Data!!! These look really cool - I think we want to convert to umol/m2/day for the units because anything else will probably be hard to look at numbers wise!
So exciting to see it coming together :) and that all the data could be processed at once! This is big time - thank you so much!
Random thing small things that don't really matter and be changed later:
Can we make the lines on the CO2 graphs a different color (not blue) than CH4 when we do the initial viz so it is easy to differentiate.
I am also wondering if it would be helpful to make the CO2 points squares or triangles in the final viz to differentiate there.
I think we are at a stage where it would be good to chat with Pat and think about the QAQC portion of this :) so exciting! Thank you!
An in-person conversation may be best since there are nuances and likely differences in opinion. My QA/QC thoughts about the curves with low R2 are:
Cases where the y-axis increases are decreases suddenly are ebullition related. These should probably be parsed into two curves where the jump occurs and tested to see if the parts show a higher R2. If so we can safely extract a rate from the one or two fluxes that have R2 above some threshold such as 0.90.
Cases where the R2 are low have historically been excluded on the assumption that there is a problem with the measurement. That made sense when fluxes were based on a limited number of GC-generated points, but this makes less sense in my opinion with these laser-based instruments for methane. The problem is that this approach assumes there must be a strong positive or negative flux, and that a null flux is due to error.
As I argued in a New Phyt paper published from my first set of laser-based data, these laser instruments generate enough data that we can have confidence that a flux with a poor R2 represents a real -- though very low -- flux that is indistinguishable from zero given the sensitivity of the instrument.
A common approach to cases where a concentration (or flux in this case) is below the detection limit of the instrument is to assign it a value that is half of the detection limit. In the NP paper, I chose not to do this but rather to keep all of the fluxes as calculated because the fluxes with low R2 were so tiny that they had no influence on the mean rates. This seems to me as the most transparent and fair way to treat these data.
Below is a figure from the SI of that paper showing the flux vs R2 relationship.
I'll put the flux computation into real (useful) volume-corrected units.
But otherwise, yes, agreed that we're at the point an in-person meeting would probably be most efficient. I have good flexibility next week and happy to come to SERC.
@wilsonsj100 What's the volume of each chamber?
Hey Ben, the total volume of the chambers used is listed in the metadata as "volume" column
Each individual chamber has a volume of 0.064 m2 I believe.
Hey Ben, the total volume of the chambers used is listed in the metadata as "volume" column
Oh brilliant, thanks! I was just looking at the chamber count column.
Awesome! thanks Ben :)
Superseded by #6
@wilsonsj100