TillF / wasa_tests

Test for WASA-SED
The Unlicense
0 stars 1 forks source link

File organization #9

Open alicelg99 opened 3 years ago

alicelg99 commented 3 years ago

The output files are not organized in the same way. Indeed, most of the time the files have a title on the first line, then the column names separated with a tab and then the data, as we can see below (an exemple of a actetranspiration.out file):

actual evapotranspiration [mm] for all sub-basins (MAP-IDs)
Year    Day Timestep    15  69
2000    1   1    3.832   4.268
2000    2   1    2.744   3.394
2000    3   1    3.295   3.382
2000    4   1    3.295   3.161
2000    5   1    2.640   2.797
2000    6   1    2.226   2.722
2000    7   1    2.580   3.061
2000    8   1    2.831   3.195
2000    9   1    2.493   3.631
2000    10  1    3.104   3.610
2000    11  1    2.919   3.206
2000    12  1    3.324   3.529
2000    13  1    3.639   3.856

So, when a file does not have this configuration, the code does not work. Indeed, sometimes we have some files without title and with spaces instead of tab between the columns. See below (an exemple of a _lake_outflowr.out file):

Year, day, hour, outflow_r(m**3/timestep)
  2000     1     1      16640.795      19944.051      24902.533      33133.648      34511.082
  2000     2     1      35829.051      42969.047      53631.609      71390.977      76124.820
  2000     3     1      55556.066      66613.492      83161.625     110666.297     119061.172
  2000     4     1      74896.547      89801.227     112116.422     149186.188     160068.328
  2000     5     1      98237.000     117803.281     147059.781     195704.938     209445.484

This second case occurs at least for the following files:

I was wondering if it is easy to change the way of how these files are written. If it is, it would be great to do it as we can compare all the files. Otherwise, I can try to adapt the code: it will be a long task but I think I can do it.

TillF commented 3 years ago

I agree that this is unfortunate. However, for legacy reasons, I am reluctant to change the output format as other scripts currently in use will break then. I suggest you implement a routine for determining the non-numeric headerlines: Just check whether the fist non-blank char is a number or not. This should give you a pretty robust way of computing the lines to skip in importing the data.

alicelg99 commented 3 years ago

Ok I understand. I will look at it, thank you.