etsap-TIMES / xl2times

Open source tool to convert TIMES models specified in Excel
https://xl2times.readthedocs.io/
MIT License
10 stars 7 forks source link

Validating data types #194

Open olejandro opened 4 months ago

olejandro commented 4 months ago

Currently the tool may produce invalid input data for GAMS, because it doesn't verify data types. Should we introduce such a verification before export to GAMS?

olejandro commented 4 months ago

@siddharth-krishna what do you think?

siddharth-krishna commented 4 months ago

Sure, perhaps add a field to times-info.json that specifies the data type of each parameter/set?

Ideally we should use the correct dtypes for pandas column from the time we read in the excel tables: #47 But this sounds like a good start towards that.

olejandro commented 4 months ago

Parameters (value columns) are always numeric, e.g. double. :-) Their indices can vary, but string generally works. Sets are similar to indices of the parameters.

siddharth-krishna commented 4 months ago

Does it not matter if a year is written out as 2020.0 when in a parameter value column, or a set index column?

Antti-L commented 4 months ago

@siddharth-krishna In the decimal system, it does not matter if you add zero decimals to a numeric value, onto the right side, right of the decimal point. The values are by definition identical. GAMS complies with the decimal system with respect to inputting parameter values in text files. GAMS GDX files may use a binary representation.

However, indexes, i.e. the members of sets are identified by labels, which are basically case insensitive strings. The years indexes are thus also identified by set element labels. and a string "2020.0" is different from the string "2020".

Ahh..., I now realize you probably mean Excel and not GAMS? If a year is represented as a numerical cell in Excel, I think you can just remove any zero decimal fractions (I tested with VEDA2 v3.0 and it does accept any number with zero decimal fractions displayed for years, but raises error at non-zero decimals). But if it is a text value, you should keep it as is, and invalidate e.g "2020.0". You can easily test a cell data type in Excel. Text values in year columns may include also year ranges (e.g. 2020-2050) and comma-separated lists of years.

SamRWest commented 4 months ago

If it helps, I've been using pandera for dataframe type (and range etc) validation, and it's pretty good. Might be worth a look for this.

olejandro commented 4 months ago

Looks like it may be what we need. Should we be applying it to inputs (i.e. EmbeddedXlTable.dataframe) or outputs (i.e. TimesModel), or both?