donboyd5 / synpuf

Synthetic PUF
MIT License
4 stars 3 forks source link

Rename repo and project #10

Open MaxGhenis opened 5 years ago

MaxGhenis commented 5 years ago

For our email to SOI, we decided to change the name to Synthetic Household File (SHF?) to better communicate that this project extends beyond the PUF (incorporating nonfilers, imputing other features, potentially different record count, etc.). This may also preemptively avoid a naming conflict with the TPC project, which seems like it might be called "synthetic PUF."

Should we adopt this name generally?

Also should we consider a term other than "household" since we're looking at tax units? For example, "Synthetic Microdata File?" Other ideas welcome.

donboyd5 commented 5 years ago

I prefer to keep it general - Synthetic Household File or Synthetic Household Policy-Analysis File. I am not sure they will always be tax units; they certainly won't always be (and aren't always now) tax-filing units. If we do state-level analysis we may be very concerned about sales taxes or benefit issues, which won't always be driving by income tax filing status.

That said, it is just a preference. I don't feel strongly.

MaxGhenis commented 5 years ago

I agree on generality, which led me to the term "microdata," but I also don't feel strongly.

11 got me thinking more generally about this though: SHF is really an umbrella project that would include synthesizing the PUF as well as other enhancement logic which currently lives in the taxdata and C-TAM repos. If we're just working on the PUF synthesis, maybe synpuf is actually an appropriate name, even if it's unpublicized and just lives as a piece of the SHF brand.

This raises questions around how these other enhancements play with the synthetic PUF (ideally as well as the real one, if Approach A in #11 is adopted), and what the real-PUF version of SHF would be called.

One potential end state would be to create two libraries:

All is to say, not sure we should rush to change this right now.

@MattHJensen @andersonfrailey

donboyd5 commented 5 years ago

The two-function approach @MaxGhenis describes seems very clean to me. Even if, in practice, there is human intervention each time we run either of the steps (examining results, etc.), it is a nice separation that keeps the projects distinct and working well with each other. Curious for others' thoughts on this and on issue #11.