NYCComptroller / Checkbook

Source codes, data, and instructions for Checkbook
https://checkbooknyc.com/
Other
49 stars 20 forks source link

Can you please post documentation on the business logic you used to filter sensitive data out? #32

Open vlewis opened 10 years ago

Ramaraju-Kalidindi-REISYS commented 10 years ago

We do not put any sensitive data into the system. all the data in the application is public data.

vlewis commented 10 years ago

Thank you for your response. I realize that you don’t publish sensitive data. I would like to know what you filter out or mask. We are going through the exercise of what data we should filter out and were looking for best practices. Adrissha thought you might have something that would be helpful.

Thanks

From: rkalidindi [mailto:notifications@github.com] Sent: Monday, April 07, 2014 8:58 AM To: NYCComptroller/Checkbook Cc: Lewis, Victoria Subject: Re: [Checkbook] Can you please post documentation on the business logic you used to filter sensitive data out? (#32)

We do not put any sensitive data into the system. all the data in the application is public data.

— Reply to this email directly or view it on GitHub https://github.com/NYCComptroller/Checkbook/issues/32#issuecomment-39726167 . https://github.com/notifications/beacon/6854994__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcxMjQ5NDY2MywiZGF0YSI6eyJpZCI6MjkzMTc4Mjl9fQ==--f76a126af7c8993f301477cfb8b9f5456dfe90b4.gif

Ramaraju-Kalidindi-REISYS commented 10 years ago

we will look at the current implementation and get back to you with more details.

treddy commented 10 years ago

Currently in Checkbook2.0 application we are masking just one field which is employee number. For masking we are using sha256 with salt as below encode(hmac(employee_number,''salt_key'',''sha256''),''hex'')

Thanks

vlewis commented 10 years ago

Ok. Thanks. What data do you filter/mask? (payments for undercover cars, payments to individual, rental assistance, etc?)

From: treddy [mailto:notifications@github.com] Sent: Thursday, April 24, 2014 10:15 AM To: NYCComptroller/Checkbook Cc: Lewis, Victoria Subject: Re: [Checkbook] Can you please post documentation on the business logic you used to filter sensitive data out? (#32)

Currently in Checkbook2.0 application we are masking just one field which is employee number. For masking we are using sha256 with salt as below encode(hmac(employee_number,''salt_key'',''sha256''),''hex'')

Thanks

— Reply to this email directly or view it on GitHub https://github.com/NYCComptroller/Checkbook/issues/32#issuecomment-41284787 . https://github.com/notifications/beacon/6854994__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcxMzk2ODA5NiwiZGF0YSI6eyJpZCI6MjkzMTc4Mjl9fQ==--7a8a4b527a00889ce995973a8d7b562427a23dfc.gif

treddy commented 10 years ago

It is employee number field of all payroll data.

kfogel commented 10 years ago

Hey, @vlewis. I believe the answer to your question is that no data is masked out of results. Everything that makes it into the database (via data import) may be published via the web UI and the APIs.

The employee IDs that @treddy is referring to are filtered out at the time of data import -- that is, the encode(hmac(employee_number, ...), ...) code is run during the conversion of the data from flatfiles (which come from the city's internal financial management system, or FMS) to CheckbookNYC's database. By the time the data is on the Checkbook server, the employee IDs are gone, replaced by salted hashes, and it is the fact that the salt is secret that protects those employee IDs from being discoverable even in the event of a full database dump.

It's possible that some other things are also filtered out at import time, similarly to employee IDs; we're checking internally now to see what those might be.

(I hope @treddy or @rkalidindi will correct me if any of the above is inaccurate, too.)

Best, -Karl

treddy commented 10 years ago

Yes, that is correct. The employee number is masked at the time of data import from flat files. And the masked value is stored in the database. And no data is masked while showing in UI.

Thanks TirupatiReddy

kfogel commented 10 years ago

Thanks, @treddy. I think the other half of the question may be important too: is there anything else that is masked out at import time, the same way employee ID is?

treddy commented 10 years ago

As of now, it is just employee number that we are masking.

kfogel commented 10 years ago

Thanks again.

That seems like it might contradict http://support.mymoneynyc.com/mymoneynyc/topics/spending_why_do_i_see_n_a_privacy_security_in_the_payee_field_for_some_transactions , though? I think @vlewis's question is about the CheckbookNYC system as a whole -- in other words, @vlewis wants to know about every case of data masking, whether data is masked by FMS before export to flatfile, or during the flatfile->DB conversion process, or after the fact by CheckbookNYC itself.

So far, we've explained here that there is one case of the second kind happening, and none of the third. But we haven't said anything about the first. Is that where the other data is masked (the data referred to in the privacy question I just linked to)?

treddy commented 10 years ago

Some data like vendor information of spending transactions is masked by FMS system while generating the flat files which are the source files for Checkbook 2.0 application. I am not sure of when and why the vendor information is masked.

kfogel commented 10 years ago

So we'd need to ask the folks at FMS. nod Thanks, @treddy.

vlewis commented 10 years ago

Yes. I am referring to what you filter out of your data before you load it into NYC Checkbook.

For example, filtering out purchases of undercover cars, rental assistance payments, etc.

From: Karl Fogel [mailto:notifications@github.com] Sent: Monday, April 28, 2014 10:22 AM To: NYCComptroller/Checkbook Cc: Lewis, Victoria Subject: Re: [Checkbook] Can you please post documentation on the business logic you used to filter sensitive data out? (#32)

So we'd need to ask the folks at FMS. nod Thanks, @treddy https://github.com/treddy .

— Reply to this email directly or view it on GitHub https://github.com/NYCComptroller/Checkbook/issues/32#issuecomment-41563728 . https://github.com/notifications/beacon/6854994__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcxNDMxNDEzNywiZGF0YSI6eyJpZCI6MjkzMTc4Mjl9fQ==--6ee5a150074e83932f2f346f2a41f4d32636069b.gif