caciviclab / disclosure-backend

DEPRECATED (We're working on the `disclosure-backend-static` repo instead) A Django app to download, extract and load campaign finance and lobbying activity data from the California Secretary of State's CAL-ACCESS database as well as local Netfile jurisdictions
https://github.com/caciviclab/disclosure-backend-static/
11 stars 20 forks source link

Load FIPS Ids & their corresponding Names & Types into API #94

Open jnmarcus opened 8 years ago

jnmarcus commented 8 years ago

FIPS Ids are used nationwide at the city, county, and state level. Per @chellrocks suggestion, and based on some various onsite discussions, we had previously decided that using FIPS ids made the most sense as it will help us to use a uniform identification nomenclature & not force us to re-write the wheel where we don't have to. Now, we need to load this data into a consumable API :smile:

Based on data retrieved by @chellrocks from ??(source needed)?? this spreadsheet was used as a starting point to aggregate the state and county fips ids for California. Note that some numbers in the state tab are missing because they are US entities considered out of scope for the sake of this project. California's FIPS ID is 6 and its corresponding county's FIPS Ids can be found on the second tab of the spreadsheet.

As of right now, it looks like we have only logged fips data for California's counties, not cities, so there is more work to do on that end. That being said, @adborden and @tdooner it should be noted that the fips id you are currently loading in the front end for the 'city' of San Francisco (6075), is in fact the county fips id.

Additionally, it appears that there are multiple counties whose name(s) are the same as the city. We should probably take inventory of this, and discuss how we are going to handle these cases.

jnmarcus commented 8 years ago

One idea I had had for how the city and county data could look (or rather the immediate data we would need with a city and or county), was something like this:

{
  "countyName": "alameda",
  "type": "county",
  "fip_id": "6001",
  "hasCities": [
    {
      "cityName": "oakland",
      "type": "city",
      "fip_id": "6001_??",
      "ofCounty": {
        "countyName": "alameda",
        "type": "county",
        "fip_id": "6001"
      },
      "ofState": {
        "stateName": "california",
        "type": "state",
        "fip_id": "6"
      },
      "hasZipCodes": [
        { 
         "zipcode": "94601", 
         "type": "zipcode", 
         "ofCity": { 
           "cityName": "oakland", 
           "type": "city", 
           "fip_id": "6001_??" 
         }, 
         "ofCounty": {
           "countyName": "alameda county", 
           "type": "county", 
           "fip_id": "6001"
         }
       }
      ],
      "collectsCampaignFinanceData": "true",
      "campaignFinanceDataSources": [
        { "name": "", "href": "" }
      ],
      "electionDataSummary": {
        "hasElectionData": true,
        "isOnline": true,
        "isPubliclyAccessible": true,
        "isMachineReadable": "",
        "pastElectionData": {
          "hasPastElectionData": true,
          "yearsPastElectionDataCollected": [
            {
              "year": "2014",
              "isFiledOnline": true,
              "isPubliclyAccessible": true,
              "isMachineReadable": true
            }
          ],
          "pastElectionDataSources": [
            { "name": "", "href": "" }
          ]
        },
        "upcomingElectionData": {
          "hasUpcomingElectionData": "",
          "isCollectingUpcomingElectionData": "",
          "dataCollectionStartDate": "DD/MM/YYYY",
          "dataCollectionEndDate": "DD/MM/YYYY",
          "dataFiledOnline": "",
          "dataPubliclyAccessible": "",
          "dataMachineReadable": "",
          "dataUpdateFrequency": ""
        }
      }
    }
  ]
}
mikeubell commented 8 years ago

Since San Francisco is both a county and a city using the county FIP is probably right.

On Dec 20, 2015, at 11:55 PM, Jamie Marcus notifications@github.com wrote:

FIPS Ids are used nationwide at the city, county, and state level. Per @chellrocks suggestion, and based on some various onsite discussions, we had previously decided that using FIPS ids made the most sense as it will help us to use a uniform identification nomenclature & not force us to re-write the wheel where we don't have to. Now, we need to load this data into a consumable API

Based on data retrieved by @chellrocks from ??(source needed)?? this spreadsheet was used as a starting point to aggregate the state and county fips ids for California. Note that some numbers in the state tab are missing because they are US entities considered out of scope for the sake of this project. California's FIPS ID is 6 and its corresponding county's FIPS Ids can be found on the second tab of the spreadsheet.

As of right now, it looks like we have only logged fips data for California's counties, not cities, so there is more work to do on that end. That being said, @adborden and @tdooner it should be noted that the fips id you are currently loading in the front end for the 'city' of San Francisco (6075), is in fact the county fips id.

— Reply to this email directly or view it on GitHub.

bcipolli commented 8 years ago

Thanks @jnmarcus ! A few thoughts...

On FIPS codes:

On your API JSON structure:

bcipolli commented 8 years ago

As far as Django models go, setting up the models seemed straightforward. Glad to hear if others have ideas for a different structure:

To start, all could be added manually. We could add an issue to have a management command and/or form to assist with adding the data, including auto-add of all relevant zip codes.

adborden commented 8 years ago

@jnmarcus I only see state and county data in your spreadsheet, do we have fips for cities?

bcipolli commented 8 years ago

@adborden nope; I posted a link with city IDs, and some suggestions how to use them, in my follow-ups.

jnmarcus commented 8 years ago

@bcipolli a couple things :smile:

Also, I saw a lot of chatter about the zip codes. It's my personal opinion we don't need zip code info right away (in v1 at least), but that is just my opinion and we should probably collect that from the group. That being said, I'm not sure if I understand why the zipcode would be at the highest level, there's also more than one zip code generally associated with a city, so I'm not really sure if that would work...I'm also not sure if I understand correctly what you mean by highest level...can u elaborate? :stuck_out_tongue:

Let me know if I forgot to answer anything :smiley:

jnmarcus commented 8 years ago

Also, I found this resource, which may be helpful http://www.census.gov/geo/reference/ansi.html it also has the voting districts available...were we missing that?

bcipolli commented 8 years ago

your question about use-case for API endpoint...are you referring to what endpoint this data would be retrieved back from? (sorry if that's a noob question)

I just mean, what are the front-end actions (components?) that we're trying to support with the API you mocked up?

why the zipcode would be at the highest level

Ya, I think.... just forget that :) Info about the front-end components or actions that will consume this data can help figure out how to output it best.

tdooner commented 8 years ago

I'm not sure if I understand why the zipcode would be at the highest level, there's also more than one zip code generally associated with a city, so I'm not really sure if that would work

Wait, how are we thinking of mapping data from NetFile/Cal-Access into a jurisdiction without zip codes? My understanding was that we would assign a list of zip codes to a FIPS code, and then we could match up contributions based on those ZIPs.

Or if we associate committees to FIPS codes, then we wouldn't need to worry about ZIP codes as you mention, but we also wouldn't be able to know what to do with contributions that aren't to those associated committees.

bcipolli commented 8 years ago

If I understood well, each contribution is reported within a specific jurisdiction, and so will have the FIPS code of the jurisdiction whose data we pull. For independent expenditures, the same committee can spend money in multiple jurisdictions, so I think FIPS on the contribution itself is the right way to go.

@andrell81, any comments on how we might use (or not use) zip codes? Do you recall them being available on a per-transaction basis?

bcipolli commented 8 years ago

I will take this one.

bcipolli commented 8 years ago

@jnmarcus are FIPS unique across city vs. county vs. state, or only unique within the group? I.e. if I have a FIPS code, is that enough to identify exactly what it is, or do I also need to know if it's a county vs. city FIPS?

Just browsing here, seemed to suggest it's not fully unique, just under the sub-category. http://www.census.gov/geo/reference/codes/cousub.html

bcipolli commented 8 years ago

Ok, reviewing this county data: http://www2.census.gov/geo/docs/reference/codes/files/national_county.txt

FIPS codes aren't even unique within the category. there are multiple counties with FIPS 001, but only one per state. So, FIPS alone isn't enough as a unique identifier.

polkapolka commented 8 years ago

Ben, 5 digit fips codes are unique. 2 digit state code + 3 digit county code.

On Mon, Dec 28, 2015 at 2:48 PM, Ben Cipollini notifications@github.com wrote:

Ok, reviewing this county data: http://www2.census.gov/geo/docs/reference/codes/files/national_county.txt

FIPS codes aren't even unique within the category. there are multiple counties with FIPS 001, but only one per state. So, FIPS alone isn't enough as a unique identifier.

— Reply to this email directly or view it on GitHub https://github.com/caciviclab/disclosure-backend/issues/94#issuecomment-167673361 .

bcipolli commented 8 years ago

It'd be great to hear from others the problem trying to be solved by using fips_id. After researching further this morning, it's not clear to me. Instead, it's very clear to me what challenges we'll face by using fips_id, and why we'd want to avoid making that a core part of this app.

The main benefit I see for a globally unique ID: *Is there any front-end use-case where we don't know the locality type (e.g. city/county/state)? I don't see one.

Why use FIPS ID? I see many challenges and literally zero benefits to doing so:

An alternative that sounds much more appealing to me is, to use arbitrary internal IDs. This:

So to summarize, I don't see the benefit of using fips_id in our code, but I see plenty of challenges. To push forward on the back-end design and development, it'd be really helpful to understand why that direction was chosen. If I missed it in a doc somewhere, or am forgetting something obvious, really, I apologize...

:tada: Happy New Year! :)

adborden commented 8 years ago

I don't really understand what the this issue represents. What is this API endpoint going to be used for? The way I see it, there are two use cases involving locality (a geographic area):

@jnmarcus is there a different use case that you're looking for?

bcipolli commented 8 years ago

Zip codes to cities are a many-to-many relationship; you can't always pick a city from a zip code, nor vice verse. Zip codes are unaffiliated with any political zoning.

Regardless, we'll have to some value onto a locality ID. My only point here is that fips_id is strictly worse than a simple arbitrary internal ID. As far as I see, zip code doesn't solve the problem either.

I think zip code is a great key for sending to an API (since users know it), but bad for back-end storage (for the reasons listed above).

adborden commented 8 years ago

@bcipolli agreed, I'm just describing a use case based on mapping zipcodes to localities. I'm not proposing anything about what to use as the primary identifier. BTW, the zipcode issue has been discussed on several occasions, I'll open an issue so we can track it better.

In fact, my point is more that there are multiple use cases, so going with an arbitrary Id would be preferred, but we'll still want additional fields to be able to map for different use cases.

What is useful about the fips, is that we have a standard and complete list of all city/county/state in the country (right?). As @jnmarcus mentioned above, I think we do want to have all the localities in the DB so we can display better messaging along the lines of "Sorry, the data for Fremont is incomplete, here's who you can call in your local government to change that."

jnmarcus commented 8 years ago

@bcipolli FIPS ids were suggested as an alternative for unique ids, when we were trying to come up with a way to deal with cross-referencing that didn't involve matching strings, as this a) could be memory/process intensive, and b) could result in false representations of the data.

Basically, a couple of the problems we were trying to solve were:

I don't believe that fips ids will solve all of our problems, but I'm not sure creating our own is the way to go either. We would need some type of methodology to identify between state, county, and city, which is how we got on the topic of fips ids to begin with - essentially the idea was 'why recreate the wheel when numbers like this are already used in similar fashion, in a publicly recognized standard?' Additionally, I think it's definitely possible we may need more than one type of unique identifier, but from what I understand, those types of identifiers will ultimately go back to either a county or the state level...(I believe)

On the contrary, I read that fips ids are being retired...but they also still seem to be commonly used...so there's that :)

Here are a couple references that I found helpful: Definitions of various Geo Codes: http://www.census.gov/geo/reference/geocodes.html - note the Legal/Statistical Area Description Codes... Hierarchy Diagram of Geographic Entities: http://www2.census.gov/geo/pdfs/reference/geodiagram.pdf ANSI Codes: http://www.census.gov/geo/reference/ansi.html

@adborden we're not going deeper than city data? I thought I'd heard the opposite, especially from the @bcipolli and the San Diego team, who expressed interested in wanting to get their county data up...

bcipolli commented 8 years ago

Since we have an internal id for localities now, mapping to FIPS can be pushed off a bit. It's great to provide front-end links with meaningful Ids (like FIPS), and to allow searches... but this no longer blocks general API development.

bcipolli commented 8 years ago

I don't think we need this for our demo.