Open vlewis opened 10 years ago
Thank you for your response. I realize that you don’t publish sensitive data. I would like to know what you filter out or mask. We are going through the exercise of what data we should filter out and were looking for best practices. Adrissha thought you might have something that would be helpful.
Thanks
From: rkalidindi [mailto:notifications@github.com] Sent: Monday, April 07, 2014 8:58 AM To: NYCComptroller/Checkbook Cc: Lewis, Victoria Subject: Re: [Checkbook] Can you please post documentation on the business logic you used to filter sensitive data out? (#32)
We do not put any sensitive data into the system. all the data in the application is public data.
— Reply to this email directly or view it on GitHub https://github.com/NYCComptroller/Checkbook/issues/32#issuecomment-39726167 . https://github.com/notifications/beacon/6854994__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcxMjQ5NDY2MywiZGF0YSI6eyJpZCI6MjkzMTc4Mjl9fQ==--f76a126af7c8993f301477cfb8b9f5456dfe90b4.gif
we will look at the current implementation and get back to you with more details.
Currently in Checkbook2.0 application we are masking just one field which is employee number. For masking we are using sha256 with salt as below encode(hmac(employee_number,''salt_key'',''sha256''),''hex'')
Thanks
Ok. Thanks. What data do you filter/mask? (payments for undercover cars, payments to individual, rental assistance, etc?)
From: treddy [mailto:notifications@github.com] Sent: Thursday, April 24, 2014 10:15 AM To: NYCComptroller/Checkbook Cc: Lewis, Victoria Subject: Re: [Checkbook] Can you please post documentation on the business logic you used to filter sensitive data out? (#32)
Currently in Checkbook2.0 application we are masking just one field which is employee number. For masking we are using sha256 with salt as below encode(hmac(employee_number,''salt_key'',''sha256''),''hex'')
Thanks
— Reply to this email directly or view it on GitHub https://github.com/NYCComptroller/Checkbook/issues/32#issuecomment-41284787 . https://github.com/notifications/beacon/6854994__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcxMzk2ODA5NiwiZGF0YSI6eyJpZCI6MjkzMTc4Mjl9fQ==--7a8a4b527a00889ce995973a8d7b562427a23dfc.gif
It is employee number field of all payroll data.
Hey, @vlewis. I believe the answer to your question is that no data is masked out of results. Everything that makes it into the database (via data import) may be published via the web UI and the APIs.
The employee IDs that @treddy is referring to are filtered out at the time of data import -- that is, the encode(hmac(employee_number, ...), ...)
code is run during the conversion of the data from flatfiles (which come from the city's internal financial management system, or FMS) to CheckbookNYC's database. By the time the data is on the Checkbook server, the employee IDs are gone, replaced by salted hashes, and it is the fact that the salt is secret that protects those employee IDs from being discoverable even in the event of a full database dump.
It's possible that some other things are also filtered out at import time, similarly to employee IDs; we're checking internally now to see what those might be.
(I hope @treddy or @rkalidindi will correct me if any of the above is inaccurate, too.)
Best, -Karl
Yes, that is correct. The employee number is masked at the time of data import from flat files. And the masked value is stored in the database. And no data is masked while showing in UI.
Thanks TirupatiReddy
Thanks, @treddy. I think the other half of the question may be important too: is there anything else that is masked out at import time, the same way employee ID is?
As of now, it is just employee number that we are masking.
Thanks again.
That seems like it might contradict http://support.mymoneynyc.com/mymoneynyc/topics/spending_why_do_i_see_n_a_privacy_security_in_the_payee_field_for_some_transactions , though? I think @vlewis's question is about the CheckbookNYC system as a whole -- in other words, @vlewis wants to know about every case of data masking, whether data is masked by FMS before export to flatfile, or during the flatfile->DB conversion process, or after the fact by CheckbookNYC itself.
So far, we've explained here that there is one case of the second kind happening, and none of the third. But we haven't said anything about the first. Is that where the other data is masked (the data referred to in the privacy question I just linked to)?
Some data like vendor information of spending transactions is masked by FMS system while generating the flat files which are the source files for Checkbook 2.0 application. I am not sure of when and why the vendor information is masked.
So we'd need to ask the folks at FMS. nod Thanks, @treddy.
Yes. I am referring to what you filter out of your data before you load it into NYC Checkbook.
For example, filtering out purchases of undercover cars, rental assistance payments, etc.
From: Karl Fogel [mailto:notifications@github.com] Sent: Monday, April 28, 2014 10:22 AM To: NYCComptroller/Checkbook Cc: Lewis, Victoria Subject: Re: [Checkbook] Can you please post documentation on the business logic you used to filter sensitive data out? (#32)
So we'd need to ask the folks at FMS. nod Thanks, @treddy https://github.com/treddy .
— Reply to this email directly or view it on GitHub https://github.com/NYCComptroller/Checkbook/issues/32#issuecomment-41563728 . https://github.com/notifications/beacon/6854994__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcxNDMxNDEzNywiZGF0YSI6eyJpZCI6MjkzMTc4Mjl9fQ==--6ee5a150074e83932f2f346f2a41f4d32636069b.gif
We do not put any sensitive data into the system. all the data in the application is public data.