datamade / bga-payroll

💰 How much do your public officials make?
4 stars 4 forks source link

Download processed data #515

Closed fgregg closed 3 years ago

fgregg commented 3 years ago

Currently, the site allows users to download the original public records. We will add the ability for users to download the processed data for a government, department, or person. DataMade will also set up tracking in Google Analytics so that downloads will be captured as events in site analytics.

This probably looks like adding things to the Unit and Person views.

@smcalilly, I'll have you propose a way to do this, and break it down into smaller issues.

smcalilly commented 3 years ago

@fgregg I'm trying to understand this site and the application's architecture before I attempt to describe a way to do this. Planning to share a proposal later today, might have some questions first.

smcalilly commented 3 years ago

@fgregg Do you want to add a way to download a specific data set on a unique page (not sure what to call these entities)? For example, this detail page has specific data for a unique "department" entity: https://salary.bettergov.org/department/city-of-chicago-department-of-police-e54e6212/?data_year=2018.

On that page, do we need a way to download the data for all the "employees" associated with that department? For that example, there are 15k employees associated with that department. So, we would need to download the information for all 15k employees? Or do we need only the high-level info about that department? Or both?

Another example -- a "unit": https://salary.bettergov.org/unit/city-of-chicago-3cd86ae7/. What pieces of this page do we need to download? Everything about the 37k employees and everything about the 38 departments?

fgregg commented 3 years ago

On that page, do we need a way to download the data for all the "employees" associated with that department? For that example, there are 15k employees associated with that department. So, we would need to download the information for all 15k employees? Or do we need only the high-level info about that department? Or both?

Yes, download the information for all 15K employees.

Another example -- a "unit": https://salary.bettergov.org/unit/city-of-chicago-3cd86ae7/. What pieces of this page do we need to download? Everything about the 37k employees and everything about the 38 departments?

Everything about the 37K employees, including the department they work in.

fgregg commented 3 years ago

So the data fields might look something like this:

hancush commented 3 years ago

FWIW, there is a query to basically reconstruct the standardized files we upload in the tests:

https://github.com/datamade/bga-payroll/blob/0cd6a8137df5aaf03d7b5566b65ffa1d441c8f26/tests/data_import/test_importutility.py#L132-L164

smcalilly commented 3 years ago

@fgregg @hancush I see, thanks. Do yall know how long it might take for 37k records to return, and if there are any performance considerations? Would this data already be cached at this point? I'm wondering if this should be synchronous or an async thing.

@hancush Is that suggesting the standardized file query is similar to what we would need here?

fgregg commented 3 years ago

as for performance let's see where we are at. 37k rows won't necessarily take too much time to return.

the queries should be a lot simpler, since the data has already been put into our final django models. https://github.com/datamade/bga-payroll/blob/master/payroll/models.py#L419

I would start with trying to use the ORM before dropping to SQL.

smcalilly commented 3 years ago

I'm not too certain, but I see two ways, so far at least:

  1. Request the existing json endpoints from the client, and use the json data to create the file in the browser. This is similar to the pattern where the page requests the json endpoint and then uses that response to render the data (at least from what I can glean). We might also need a new endpoint for this, not sure. This might look like:
    • some javascript would request the endpoint when a user clicks a "download data" button
    • the request would return a json object with all the data we need
    • use the json object to create a csv or excel file in the browser with an npm package
  2. Create a new endpoint that serves a file based on the requested data.
    • some javascript would request the new endpoint when a user clicks a "download data" button
    • the backend would create a file and serve it to the user
    • (would this be a new serializer with a django_rest view?)

I'm confused where the data is coming from or supposed to come from. I've been requesting some of the json endpoints and that seems like a hopeful place, but I've not been able to get exactly what I want from the apis. @fgregg you mentioned that it would probably have something to do with the Unit and Person views. Did you mean the rest_framework PersonViewSet or the standard django PersonView? It looks like the standard PersonView adds the person data to the context object and then serves the template, but I'm witnessing an ajax request that appears to get the data from the PersonViewSet.

cc: @hancush

fgregg commented 3 years ago

Looks like you've done some great exploration. Let's synch up and chat.

smcalilly commented 3 years ago

@fgregg Following up with what I'm about to do:

Let me know if I'm missing something.

fgregg commented 3 years ago

purrrfect.