Closed fgregg closed 3 years ago
@fgregg I'm trying to understand this site and the application's architecture before I attempt to describe a way to do this. Planning to share a proposal later today, might have some questions first.
@fgregg Do you want to add a way to download a specific data set on a unique page (not sure what to call these entities)? For example, this detail page has specific data for a unique "department" entity: https://salary.bettergov.org/department/city-of-chicago-department-of-police-e54e6212/?data_year=2018.
On that page, do we need a way to download the data for all the "employees" associated with that department? For that example, there are 15k employees associated with that department. So, we would need to download the information for all 15k employees? Or do we need only the high-level info about that department? Or both?
Another example -- a "unit": https://salary.bettergov.org/unit/city-of-chicago-3cd86ae7/. What pieces of this page do we need to download? Everything about the 37k employees and everything about the 38 departments?
On that page, do we need a way to download the data for all the "employees" associated with that department? For that example, there are 15k employees associated with that department. So, we would need to download the information for all 15k employees? Or do we need only the high-level info about that department? Or both?
Yes, download the information for all 15K employees.
Another example -- a "unit": https://salary.bettergov.org/unit/city-of-chicago-3cd86ae7/. What pieces of this page do we need to download? Everything about the 37k employees and everything about the 38 departments?
Everything about the 37K employees, including the department they work in.
So the data fields might look something like this:
FWIW, there is a query to basically reconstruct the standardized files we upload in the tests:
@fgregg @hancush I see, thanks. Do yall know how long it might take for 37k records to return, and if there are any performance considerations? Would this data already be cached at this point? I'm wondering if this should be synchronous or an async thing.
@hancush Is that suggesting the standardized file query is similar to what we would need here?
as for performance let's see where we are at. 37k rows won't necessarily take too much time to return.
the queries should be a lot simpler, since the data has already been put into our final django models. https://github.com/datamade/bga-payroll/blob/master/payroll/models.py#L419
I would start with trying to use the ORM before dropping to SQL.
I'm not too certain, but I see two ways, so far at least:
I'm confused where the data is coming from or supposed to come from. I've been requesting some of the json endpoints and that seems like a hopeful place, but I've not been able to get exactly what I want from the apis. @fgregg you mentioned that it would probably have something to do with the Unit and Person views. Did you mean the rest_framework
PersonViewSet or the standard django
PersonView? It looks like the standard PersonView
adds the person data to the context object and then serves the template, but I'm witnessing an ajax request that appears to get the data from the PersonViewSet
.
cc: @hancush
Looks like you've done some great exploration. Let's synch up and chat.
@fgregg Following up with what I'm about to do:
Let me know if I'm missing something.
purrrfect.
Currently, the site allows users to download the original public records. We will add the ability for users to download the processed data for a government, department, or person. DataMade will also set up tracking in Google Analytics so that downloads will be captured as events in site analytics.
This probably looks like adding things to the Unit and Person views.
@smcalilly, I'll have you propose a way to do this, and break it down into smaller issues.