autogluon / tabrepo

Apache License 2.0
27 stars 7 forks source link

Update input format (Part 3) #50

Closed Innixma closed 8 months ago

Innixma commented 8 months ago

Update input files to have minimal columns and be in parquet format.

Two input files:

raw.parquet
comparison.parquet

They are stored in snappy parquet format.

In total, 32 MB compared to previous 500 MB CSV.

tid is now fully optional as an input column, and metadata is also fully optional.

Simulation results will be very slightly different as I also removed the 4 digit rounding that occurred in the test scores of the models. I've confirmed that results are identical to mainline if the digit rounding is kept (we should remove it though, as it changes metric_error to 0 for values <0.0001, which isn't ideal.