.ini files are not in standard .ini file format.

The .ini files that we are using to define processing procedures do not follow standard .ini format. INI files require key/value pairs (we have) separated by unique section headers (we don't have). This makes reading/editing/sharing our .ini files ambiguous and tedious. They're not portable, because they depend on Biomet.net\matlab\TRACEANALYSIS_FIRSTSTAGE\read_ini_file.m for parsing/interpretation and this also adds a layer of complexity to troubleshooting issues with the .ini files. There are libraries in Python and R for handling .ini files, and if you try to read our files with them, they fail because [Trace] ... [End] syntax is not standard. There is also a matlab function for reading standard .ini files.

If we want to stick with the existing procedures, I suggest we change the file extension to .txt, because they are really just text files. Personally, I think it would be a good idea to switch to a standardized format, like the exiting .ini framework. I wrote a python script to translate the exiting .ini files to the standardized format, which is included in a push of my calculation procedures branch here. You can see an example of the standardized file here

Using the standard format should make expansion, troubleshooting, and future upgrades easier. We could even consider more flexible formats (.json or .yml) which allow for nested sections so that stages 1, 2, & 3 could all be defined in the same file. But lets not put the cart before the horse here.
Its also more portable, which would make us less platform dependent; procedures could be more easily translated to a different language in the future.

I've created updated .ini files in the standard format for BB, BB2, BBS & DSM for Stages 1 & 2 as examples in my branch here. I'm also working up an example for stage 3 (only BB1/BBS so far), that works in R using the INI package.

Additionally, I wrote an matlab function in Biomet.Net: Biomet.net\matlab\TRACEANALYSIS_FIRSTSTAGE\read_ini_file_updated.m to interpret the standardized .ini files in matlab using an adaption of the above linked function. The pull request can be viewed here. It's not ready for merging, just giving for an example.

The function to read the updated ini files should produce exactly the same outputs as the original read_ini_file.m function (provided there are no errors in the ini file, e.g., repeated traces or keys). I also created a function to test the ini files by comparing the standardized and original formats, to check that they are equivalent.
Biomet.net\matlab\TRACEANALYSIS_FIRSTSTAGE\ini_compare.m will test for differences between an original ini file and the standardized translated version, and print where there are issues: which mostly result from issues in the original ini files (like duplicate traces).
I've tested these files (just locally) for BB, BB2, BBS & DSM and it works fine for Stage 1 & 2. I've also tested stage 3 for BB1 and it works, though there are some more general updates to the stage 3 procedures still to be implemented (removing hard coded calls to CH4. The procedures for implementing the updated ini reader: two conditional calls to the old or updated ini reader, depending on site ID, one in readIniFileDirect.m and one in read_data.m. At least locally, these function the same as the original ini file procedures.

Thoughts??

The pull requests I've submitted aren't complete yet, they're just for example purposes. Still some testing/tweaking that I'd want to do before production level incorporation.

My initial thoughts:

This is very useful for the future software re-write (if we don't just switch to OneFlux and forget about Biomet.net).
It doesn't solve any real issues beside the compatibility with other software (Python, R) that we don't use for the stages 1 & 2 anyway. Very limited ROI for testing and troubleshooting the converted files and dealing with potential legacy issues.
We would lose the readability of the legacy second stage ini files by forcing the Evaluate1 lines to fit in a single line. I've been maintaining ini files which would have more than a thousand characters in many Evalute1 single lines. That would not be too helpful.
Let's focus on the stage 3 ini file upgrades that limit the impact that these changes could have on the legacy stuff. The third stage can be then used as a good starting point if we move toward re-write of the data cleaning library.

Some misc. comments

The "standard" ini files are not linear, they only load up the setup information. The Biomet.net ini files are a pseudo-code. They execute linearly and the order is important.
Because of that, the repeated traces in stage 2 were a "feature" in the legacy files. They were used to enable "programming" during the second (and the old 3rd stage) processing by allowing traces to be changed in an iterative fashion. That was not the best practice but people used it so now we are (I am) stuck with it.
Having a bit of ability to "program" within the ini file saves us from having everyone branching off with their own edits to the main functions - instead they can just use the workarounds in the ini files. Of course, there are many better ways of solving this in any future complete re-writes.

I just re-read the above and I think the short version of this entire comment is "please don't change this because it's going to cause me tons of pain to fix up stuff that I don't have time to be fixing up" ;-)

Let's discuss.

@znesic I don't want to create more work for you, And I definitely see the merit in saying "if it ain't broke, don't fix it". I'm not advocating for changing the behaviour of any of the other sites you're responsible for. I am only advocating for upgrading the behaviour for the existing micromet sites and any potential future sites that are incorporated. With the few conditional statements (or some variant of such) suggested in readIniFileDirect.m and read_data.m it should relatively straightforward to upgrade where desired while maintaining legacy behaviour everywhere else.

It doesn't solve any real issues beside the compatibility with other software (Python, R) that we don't use for the stages 1 & 2 anyway. Very limited ROI for testing and troubleshooting the converted files and dealing with potential legacy issues.

I respectfully disagree: As a user, dealing with the ini files can be a huge PITA, I find it to be a big time saver to work with something that is compatible between languages. For example: if I draft up a standardized .ini and read it with python's ConfigParser module or the R INI package, I can quickly check to make sure I only have one copy of any given trace. Duplicates are nearly inevitable somewhere (at least for me) when using one site's ini file as a template for another. The read_ini_file.m doesn't flag or restrict duplicate traces so you can end up with unexpected behaviour. If you had the below traces in an ini file, you'd get the second trace saved in stage 2. Obviously, this is a silly example; but there are other circumstances that could be less obvious but more problematic.

` [Trace] variableName = 'TA_1_1_1'

Evaluate = 'TA_1_1_1=TA_1_1_1;'

title = 'Air temperature at 4m (HMP)'
units = '^oC'
minMax = [-20,50]

[End]

[Trace] variableName = 'TA_1_1_1'

Evaluate = 'TA_1_1_1=TA_1_1_1*2;'

title = 'Air temperature at 4m (HMP)'
units = '^oC'
minMax = [-20,50]

[End]`

Another note on complex eval statements. Any exiting ones that work don't need to be changed. But putting together new ones can be quite a time suck if you're doing more than a few lines of code. So anything that is too ugly to be condensed into one line of an ini file probably deserves call to a script anyway anyway? The trace backs are pretty ambiguous and syntax that works in pure matlab doesn't always preform after the string parsing of the ini file (e.g., needed a comma in a for loop or parenthesis enclosing arguments in an if statement. Even something like year(datetime("2017-01-01")) fails in an eval statement, because up the chain in read_data.m the year method has been overwritten.

So in summary, I 100% agree that it does not makes sense to go back and redo a bunch of things that have already been done and you already have working optimally. But I do think it makes sense to streamline things going forward. As you suggest maybe a transition to OneFlux down the would make this discussion pointless anyway. However, OneFlux isn't fully open source either (requires Matlab 2018a), so I personally wouldn't advocate for that. Also OneFlux pipeline still runs in Python 2.7, which is no longer supported. The docs say "update of the code to Python 3 is ongoing" but their main branch on GitHub hasn't seen a commit in over two years so I'm not sure if I believe that?

I still think that the users would prefer the old formatting in cases like this.

New:

[VWC_1_1_1]
variableName=VWC_1_1_1
Evaluate=file_path = setFolderSeparator(fullfile(db_pth_root,"Calculation_Procedures/TraceAnalysis_ini/BB/Derived_Variables/VWC_Flags.csv"));,T=readtable(file_path);,DT = datetime(clean_tv,"ConvertFrom","datenum");,Change_index = table2array(T(:,1));,Sensor_postion = table2array(T(:,2));,VWC_Position=NaN(size(clean_tv));,for (i=1:length(Change_index)),VWC_Position(DT>Change_index(i),1)=Sensor_postion(i);,end,VWC_1_1_1=NaN(size(clean_tv));,VWC_1_1_1(VWC_Position==1) = VWC_raw(VWC_Position==1);,VWC_1_1_1 = SpikeFilter(VWC_1_1_1,clean_tv,1,[-.5,1],"natural");,VWC_1_1_1 = SpikeFilter(VWC_1_1_1,clean_tv,0,2,"z-score");
title=volumetric water content with sensor in hollows
units=%%
minMax=[0,100]

Old:

[Trace]
    variableName = 'VWC_1_1_1'

    Evaluate = 'file_path = setFolderSeparator(fullfile(db_pth_root,"Calculation_Procedures/TraceAnalysis_ini/BB/Derived_Variables/VWC_Flags.csv"));
                T=readtable(file_path);
                DT = datetime(clean_tv,"ConvertFrom","datenum");
                Change_index = table2array(T(:,1));
                Sensor_postion = table2array(T(:,2));
                VWC_Position=NaN(size(clean_tv));
                for (i=1:length(Change_index))
                    VWC_Position(DT>Change_index(i),1)=Sensor_postion(i);
                end

                VWC_1_1_1=NaN(size(clean_tv));
                VWC_1_1_1(VWC_Position==1) = VWC_raw(VWC_Position==1);
                VWC_1_1_1 = SpikeFilter(VWC_1_1_1,clean_tv,1,[-.5,1],"natural");
                VWC_1_1_1 = SpikeFilter(VWC_1_1_1,clean_tv,0,2,"z-score");'

    title = 'volumetric water content with sensor in hollows'
    units = '%'
    minMax = [0,100]
[End]

While using scripts for long lines is my preferred solution too, I mostly stayed away from it for anything that's not super complex. If every Evaluate line longer than let's say 100 characters were converted to a script, there would be dozens of scripts per site that the user would have to open and check while troubleshooting. Either way it's a PITA indeed. :-(

We should probably seek input from other users about this one-line vs multiple-line vs scripts in the Evaluate1 lines about which way would make their work easier.

If using the same trace name multiple times is not a "feature" for some ugly bootstrapping of some more complex calculations (as I tried to argue) then a simple syntax checker inside the read_ini_file functions should do the trick.

Looking at the big picture, it seems to me that putting more effort into making the 3rd stage processing great (and using it as a template for the future versions of the 1st and 2nd stages) or finishing up the EddyPro automation would move us forward much faster and have a much greater impact. Making those processes robust and testing them properly will take a lot of time so we need to pick our battles.

Moving forward, if we feel that the ini files are the ones slowing us down and are worth the time investment, here are the necessary steps that I use when testing new features that may break something:

limit the testing to one site (I would pick BB1 as the one with the most data)
use the new ini file on the local database and re-clean (stages 1 and 2 only) for all the years
compare the new results with the old ones (we have a Matlab function for this)
if they are the same proceed with the next site
if they are different, figure out which one is correct. If the new one is wrong - fix it. If the current one is wrong then we should be happy that we got the new one working :-)
add this site to the list of "approved" sites for the new ini files and archive the old ini file.
repeat for the next site

If multi-line eval statements are preferred then I would suggest the YAML (.yml) file format instead. There is are a collection of matlab functions to handlle YAML files. I did a quick test, and it looks like mult-line eval statements are read/parsed correctly. It also maintains the spacing when parsing text, so its not necessary to add parenthesis to for loops/if statements. Python and R also support YAML format, so its a good all around option. It is also the same config structure that is used to define the back end of UBC Micromet webpage.

Another nice thing a bout yaml files, is they support nested key-value pairs. So one could even go so far as to define all stages of processing in the same configuration file. For example:

FC:
  FirstStage:  
    title: CO2 flux
    originalVariable: CO2 flux
    inputFileName: {co2_flux}
    inputFileName_dates: [datenum(2014,1,1) datenum(2999,1,1)] 
    measurementType: flux
    units: umol/m^2/s
    instrument: CSAT3 & LI-7200
    instrumentSN: 1389-1 & 72H-0509
    calibrationDates: 
    comments: 
    minMax: [-50,50]
    zeroPt: [-9999]
  SecondStage:
    Evaluate: FC=FC;
    title: CO2 flux
    units: \mu mol m^{-2} s^{-1}
  ThirdStage:
    REddyProc: Define any necessary values to dictate processing

NEE:
  FirstStage:
    title: Net Ecosystem Exchange
    comments: Not calculated in first stage ... proceed to next stage
  SecondStage:
    Evaluate: > 
      % Calculate Storage Corrected NEE

      NEE=FC+SC;
      minMax = [-50,50]; % Re-filter here because of sum 

      NEE(or(NEE<minMax(1),NEE>minMax(2)))=NaN;
    title: Net Ecosystem Exchange (Storage Corrected)
    units: \mu mol m^{-2} s^{-1}
  ThirdStage:
    REddyProc: Define any necessary values to dictate processing

I get that its not a priority go back and change things, but I still think its a worthwhile discussion to have, and it can also help to clarify how we move forward with the other priorities (Stage 3 + EddyPro Automation). Thinking about how to structure the stage 3 ini file is what prompted me to raise this issue. If I'm going to work on drafting a clear and concise stage 3 ini file for BBS > How should it be laid out? is an important question to ask. My line of thinking is that it should be similar to the structure of the other stages for compatibility and clarity if possible so users aren't required work with multiple formats.

Hopefully some others have thoughts on this too and we can have a broader conversation about it at our next meeting.

CANFLUX / Calculation_Procedures

.ini files are not in standard .ini file format. #84