JGCRI / gcamdata

The GCAM data system
https://jgcri.github.io/gcamdata/
Other
43 stars 26 forks source link

level2 data files aren't built, tested, or read in #317

Closed pkyle closed 7 years ago

pkyle commented 7 years ago

In the old data system, "level2" code files wrote out data using the write_mi_data() function instead of writedata(), and the write_mi_data did three things: (1) pre-pended four rows with some meta-info (INPUT_TABLE, Variable ID, [IDstring], [blank row]) to the CSV file being built; (2) wrote out the data table; and (3) inserted the file path and name of the data table to a user-specified XML batch file (a file that instructs the model interface to construct an XML input file using any number of CSV files). There was also an option to (4) append instructions for node re-naming, used in some of the land allocator files. To this point we haven't really been working on level2 files, so with the exception of modeltime this hasn't really been an issue. So, the first part of this issue is that the capability to build data files that the model interface will turn into XML appears to have been lost. The second part is that the testing currently isn't picking up the meta-info in the level2 tables; while the L200* modeltime files (the only level2 files committed to the master branch) are currently passing the checks, a manual comparison (e.g., tests/testthat/comparison_data/modeltime/L200.hector.csv versus outputs/L200.hector.csv) will show the differences (one has the four rows of meta-info and the other doesn't). The third part of this issue is that some of the formerly level2 code files read in data written out by other level2 code files; we used "skip = 4" in the readdata() function call to skip the first four rows. Now that the data read-in is handled upstream of code chunks, meta-info will also need to be skipped upstream.

cahartin commented 7 years ago

@pkyle I'm not too familiar with the data system, but when you read in the files could you do something like, read_csv("file", comment = '#')? This would skip the commented lines.

bpbond commented 7 years ago

@pkyle Right, we put off dealing with a bunch of the Level 2 issues for the obvious reason that we wouldn't need them for a while.

So, the first part of this issue is that the capability to build data files that the model interface will turn into XML appears to have been lost.

This is not correct; we already have a batch_xml generator and code that calls the necessary Java library to generate xml (h/t @pralitp ).

The second part is that the testing currently isn't picking up the meta-info in the level2 tables; while the L200* modeltime files (the only level2 files committed to the master branch) are currently passing the checks, a manual comparison (e.g., tests/testthat/comparison_data/modeltime/L200.hector.csv versus outputs/L200.hector.csv) will show the differences (one has the four rows of meta-info and the other doesn't).

This is a good point: while the testing system currently will skip the header info in Level2 tables, it doesn't compare it. I'll split this into a separate issue.

The third part of this issue is that some of the formerly level2 code files read in data written out by other level2 code files; we used "skip = 4" in the readdata() function call to skip the first four rows. Now that the data read-in is handled upstream of code chunks, meta-info will also need to be skipped upstream.

Data isn't written to disk in between Level1 and Level2, but is rather kept in memory, with metadata attached as attributes. So I don't think this is applicable.