Open June-Skeeter opened 9 months ago
My initial thoughts:
Some misc. comments
I just re-read the above and I think the short version of this entire comment is "please don't change this because it's going to cause me tons of pain to fix up stuff that I don't have time to be fixing up" ;-)
Let's discuss.
@znesic I don't want to create more work for you, And I definitely see the merit in saying "if it ain't broke, don't fix it". I'm not advocating for changing the behaviour of any of the other sites you're responsible for. I am only advocating for upgrading the behaviour for the existing micromet sites and any potential future sites that are incorporated. With the few conditional statements (or some variant of such) suggested in readIniFileDirect.m and read_data.m it should relatively straightforward to upgrade where desired while maintaining legacy behaviour everywhere else.
It doesn't solve any real issues beside the compatibility with other software (Python, R) that we don't use for the stages 1 & 2 anyway. Very limited ROI for testing and troubleshooting the converted files and dealing with potential legacy issues.
` [Trace] variableName = 'TA_1_1_1'
Evaluate = 'TA_1_1_1=TA_1_1_1;'
title = 'Air temperature at 4m (HMP)'
units = '^oC'
minMax = [-20,50]
[End]
[Trace] variableName = 'TA_1_1_1'
Evaluate = 'TA_1_1_1=TA_1_1_1*2;'
title = 'Air temperature at 4m (HMP)'
units = '^oC'
minMax = [-20,50]
[End]`
year(datetime("2017-01-01"))
fails in an eval statement, because up the chain in read_data.m the year method has been overwritten.So in summary, I 100% agree that it does not makes sense to go back and redo a bunch of things that have already been done and you already have working optimally. But I do think it makes sense to streamline things going forward. As you suggest maybe a transition to OneFlux down the would make this discussion pointless anyway. However, OneFlux isn't fully open source either (requires Matlab 2018a), so I personally wouldn't advocate for that. Also OneFlux pipeline still runs in Python 2.7, which is no longer supported. The docs say "update of the code to Python 3 is ongoing" but their main branch on GitHub hasn't seen a commit in over two years so I'm not sure if I believe that?
I still think that the users would prefer the old formatting in cases like this.
New:
[VWC_1_1_1]
variableName=VWC_1_1_1
Evaluate=file_path = setFolderSeparator(fullfile(db_pth_root,"Calculation_Procedures/TraceAnalysis_ini/BB/Derived_Variables/VWC_Flags.csv"));,T=readtable(file_path);,DT = datetime(clean_tv,"ConvertFrom","datenum");,Change_index = table2array(T(:,1));,Sensor_postion = table2array(T(:,2));,VWC_Position=NaN(size(clean_tv));,for (i=1:length(Change_index)),VWC_Position(DT>Change_index(i),1)=Sensor_postion(i);,end,VWC_1_1_1=NaN(size(clean_tv));,VWC_1_1_1(VWC_Position==1) = VWC_raw(VWC_Position==1);,VWC_1_1_1 = SpikeFilter(VWC_1_1_1,clean_tv,1,[-.5,1],"natural");,VWC_1_1_1 = SpikeFilter(VWC_1_1_1,clean_tv,0,2,"z-score");
title=volumetric water content with sensor in hollows
units=%%
minMax=[0,100]
Old:
[Trace]
variableName = 'VWC_1_1_1'
Evaluate = 'file_path = setFolderSeparator(fullfile(db_pth_root,"Calculation_Procedures/TraceAnalysis_ini/BB/Derived_Variables/VWC_Flags.csv"));
T=readtable(file_path);
DT = datetime(clean_tv,"ConvertFrom","datenum");
Change_index = table2array(T(:,1));
Sensor_postion = table2array(T(:,2));
VWC_Position=NaN(size(clean_tv));
for (i=1:length(Change_index))
VWC_Position(DT>Change_index(i),1)=Sensor_postion(i);
end
VWC_1_1_1=NaN(size(clean_tv));
VWC_1_1_1(VWC_Position==1) = VWC_raw(VWC_Position==1);
VWC_1_1_1 = SpikeFilter(VWC_1_1_1,clean_tv,1,[-.5,1],"natural");
VWC_1_1_1 = SpikeFilter(VWC_1_1_1,clean_tv,0,2,"z-score");'
title = 'volumetric water content with sensor in hollows'
units = '%'
minMax = [0,100]
[End]
While using scripts for long lines is my preferred solution too, I mostly stayed away from it for anything that's not super complex. If every Evaluate line longer than let's say 100 characters were converted to a script, there would be dozens of scripts per site that the user would have to open and check while troubleshooting. Either way it's a PITA indeed. :-(
We should probably seek input from other users about this one-line vs multiple-line vs scripts in the Evaluate1 lines about which way would make their work easier.
If using the same trace name multiple times is not a "feature" for some ugly bootstrapping of some more complex calculations (as I tried to argue) then a simple syntax checker inside the read_ini_file functions should do the trick.
Looking at the big picture, it seems to me that putting more effort into making the 3rd stage processing great (and using it as a template for the future versions of the 1st and 2nd stages) or finishing up the EddyPro automation would move us forward much faster and have a much greater impact. Making those processes robust and testing them properly will take a lot of time so we need to pick our battles.
Moving forward, if we feel that the ini files are the ones slowing us down and are worth the time investment, here are the necessary steps that I use when testing new features that may break something:
If multi-line eval statements are preferred then I would suggest the YAML (.yml) file format instead. There is are a collection of matlab functions to handlle YAML files. I did a quick test, and it looks like mult-line eval statements are read/parsed correctly. It also maintains the spacing when parsing text, so its not necessary to add parenthesis to for loops/if statements. Python and R also support YAML format, so its a good all around option. It is also the same config structure that is used to define the back end of UBC Micromet webpage.
Another nice thing a bout yaml files, is they support nested key-value pairs. So one could even go so far as to define all stages of processing in the same configuration file. For example:
FC:
FirstStage:
title: CO2 flux
originalVariable: CO2 flux
inputFileName: {co2_flux}
inputFileName_dates: [datenum(2014,1,1) datenum(2999,1,1)]
measurementType: flux
units: umol/m^2/s
instrument: CSAT3 & LI-7200
instrumentSN: 1389-1 & 72H-0509
calibrationDates:
comments:
minMax: [-50,50]
zeroPt: [-9999]
SecondStage:
Evaluate: FC=FC;
title: CO2 flux
units: \mu mol m^{-2} s^{-1}
ThirdStage:
REddyProc: Define any necessary values to dictate processing
NEE:
FirstStage:
title: Net Ecosystem Exchange
comments: Not calculated in first stage ... proceed to next stage
SecondStage:
Evaluate: >
% Calculate Storage Corrected NEE
NEE=FC+SC;
minMax = [-50,50]; % Re-filter here because of sum
NEE(or(NEE<minMax(1),NEE>minMax(2)))=NaN;
title: Net Ecosystem Exchange (Storage Corrected)
units: \mu mol m^{-2} s^{-1}
ThirdStage:
REddyProc: Define any necessary values to dictate processing
I get that its not a priority go back and change things, but I still think its a worthwhile discussion to have, and it can also help to clarify how we move forward with the other priorities (Stage 3 + EddyPro Automation). Thinking about how to structure the stage 3 ini file is what prompted me to raise this issue. If I'm going to work on drafting a clear and concise stage 3 ini file for BBS > How should it be laid out? is an important question to ask. My line of thinking is that it should be similar to the structure of the other stages for compatibility and clarity if possible so users aren't required work with multiple formats.
Hopefully some others have thoughts on this too and we can have a broader conversation about it at our next meeting.
The .ini files that we are using to define processing procedures do not follow standard .ini format. INI files require key/value pairs (we have) separated by unique section headers (we don't have). This makes reading/editing/sharing our .ini files ambiguous and tedious. They're not portable, because they depend on Biomet.net\matlab\TRACEANALYSIS_FIRSTSTAGE\read_ini_file.m for parsing/interpretation and this also adds a layer of complexity to troubleshooting issues with the .ini files. There are libraries in Python and R for handling .ini files, and if you try to read our files with them, they fail because [Trace] ... [End] syntax is not standard. There is also a matlab function for reading standard .ini files.
If we want to stick with the existing procedures, I suggest we change the file extension to .txt, because they are really just text files. Personally, I think it would be a good idea to switch to a standardized format, like the exiting .ini framework. I wrote a python script to translate the exiting .ini files to the standardized format, which is included in a push of my calculation procedures branch here. You can see an example of the standardized file here
I've created updated .ini files in the standard format for BB, BB2, BBS & DSM for Stages 1 & 2 as examples in my branch here. I'm also working up an example for stage 3 (only BB1/BBS so far), that works in R using the INI package.
Additionally, I wrote an matlab function in Biomet.Net: Biomet.net\matlab\TRACEANALYSIS_FIRSTSTAGE\read_ini_file_updated.m to interpret the standardized .ini files in matlab using an adaption of the above linked function. The pull request can be viewed here. It's not ready for merging, just giving for an example.
Thoughts??
The pull requests I've submitted aren't complete yet, they're just for example purposes. Still some testing/tweaking that I'd want to do before production level incorporation.