Open donboyd5 opened 2 hours ago
@donboyd5, When I look at issue #107, there is nothing about the contents of the tmd.csv
file. So, I don't understand why you say:
tmd data files had duplicate records at one point, per issue https://github.com/PSLmodels/tax-microdata-benchmarking/issues/107
@martinholmer, tmd_2021.csv, now defunct, is the data discussed in #107. And tmd_2021.csv is a superset of tmd.csv - it had the same records, but more variables. If tmd_2021.csv (examined) had duplicate records then tmd.csv (not examined) must have had duplicate records. Hence the general mention above of "tmd data files" having duplicate records.
I believe @nikhilwoodruff probably solved the duplicate records issue, but he did not weigh in on it and I don't remember the last status - there were a lot of issues to resolve and this could have slipped through the cracks.
Thus the right course of action is for us to determine whether there are duplicate records now. If not, we can close this issue. If there are, we'll prorably want to eliminate duplicate records, even if only by collapsing them, and close this issue.
tmd data files had duplicate records at one point, per issue #107.
Determine whether there are duplicate records now. If not, close this issue.
If there are, eliminate duplicate records and close this issue.