PSLmodels / tax-microdata-benchmarking

A project to develop a benchmarked general-purpose dataset for tax reform impact analysis.
https://pslmodels.github.io/tax-microdata-benchmarking/
2 stars 6 forks source link

Minor issue: tmd_2021.csv column names can be in differing orders from run to run #140

Closed donboyd5 closed 1 month ago

donboyd5 commented 3 months ago

The tmd_2021.csv column names are not always in the same order even though the set of columns is the same. For example, in a recent excercise in which I created 4 variants of tmd output, under 4 different sets of assumptions, the names of the first 3 of the 213 columns were:

image

This defeats software such as R's vroom, which can read and combine a set of uniformly-structured csv files rapidly in parallel, which is useful when comparing multiple versions of tmd output prepared with different assumptions.

The alternative is to read the files one by one and combine them, which is considerably slower.

Not a big deal, but at some point, @nikhilwoodruff, it would be great if you could force the structure (column order) of tmd_2021.csv to be consistent from run to run.

martinholmer commented 3 months ago

@nikhilwoodruff, What is the timeline for resolving issue #140?