Peter-Metz / state_taxdata

3 stars 2 forks source link

Deploy to C/S #5

Open MattHJensen opened 4 years ago

MattHJensen commented 4 years ago

When efficient. From my perspective as a reviewer sooner is better, but not worth rushing Peter or Hank.

(Discussed on a call between Peter and Matt on 7-13-20)

Peter-Metz commented 4 years ago

PR #7 adds files for CS.

I noticed on Tax-Brain that there is some fuzzing of the PUF. Is that something we need to think about here?

cc @andersonfrailey because maybe he knows something about the fuzzing

andersonfrailey commented 4 years ago

My understanding is that whenever the PUF is used, there needs to be fuzzing. Otherwise you're good.

MattHJensen commented 4 years ago

One of the outputs of the app will be the actual data, so I think our only option is to make the app private and only available to those with PUF access. We could add a note like the one I suggested for TDG:

This web application is only available to those with access to the IRS SOI Public Use File through agreements with the IRS. Account login information must not be shared, and the files generated by this program, unless explicitly marked in their filename as PUBLIC, should not be shared beyond those with access to the IRS SOI Public Use File.

This way we also don't need to worry about fuzzing (more precisely, the dropq algorithm that the IRS approved for Tax-Brain), which is a significant complication and limitation.

Peter-Metz commented 4 years ago

@MattHJensen that makes sense. Looping in @hdoupe to see if CS has the ability to make an app private and how that works?

hdoupe commented 4 years ago

Hey @Peter-Metz, there currently isn't a way for an app to be completely private, only un-listed. However, this is definitely on the Compute Studio road map.

The easiest way to get this set up is to just do the backend changes that add user access levels to models. This would be straightforward since most of the logic has already been sorted out when I did this for simulations. Once this is set up, I can manually add CS users who you want to have access to the app.

The next step would be to add in the UI for managing user-access/collaborators which would be similar to the collaborators UI on simulations.

MattHJensen commented 4 years ago

One of the outputs of the app will be the actual data,

I think I mispoke. We can just provide in the output the weights, blowup factors, and the results/tests/diagnostics tables, right? No need to offer the actual records data. So there wouldn't be anything proprietary in the output.

What's more, the the tables aren't subject to the differencing attacks from which dropq protects us in Tax-Brain, because the user doesn't control how many tax units are in each table cell (in Tax-Brain they can control the number of units in each cell through brackets and other tax parameters).

So, unless I am missing something (again), my new take is that we need neither a private app nor disclosure avoidance algorithm.

Peter-Metz commented 4 years ago

@MattHJensen in the near-term, I agree that there is nothing confidential about the output of the data prep routines. However, when we start generating weights, it seems user-unfriendly to require merging two large datasets (the weights and the records), especially if the user has to split the PUF into AGI bins and merge one bin at a time. Realistically, the output of this project doesn't have much use to someone without the PUF, so I don't see much downside in making this a private app.

Peter-Metz commented 4 years ago

@hdoupe, I chatted with @MattHJensen this morning and we decided that we likely won't need actual PUF data in the CS output (although we'll make the final call later). Currently in #7, the CS app outputs aggregate stats, so we should be ready to publish a public, non-listed app.

Let me know if you'd like to review, or if I should merge #7 and fill out a CS publishing form.

hdoupe commented 4 years ago

@Peter-Metz if the tests are passing, then feel free to merge! I'm excited to add private apps to support having the PUF, but I'm glad we can have a version of this up before then.