agude / SWITRS-to-SQLite

Python script for converting California's Statewide Integrated Traffic Records System (SWITRS) reports to SQLite.
https://alexgude.com/blog/switrs-to-sqlite/
Other
9 stars 3 forks source link

County City Location sometimes is four digits #7

Closed agude closed 2 years ago

agude commented 3 years ago

But should be five! According to the codebook sometimes leading 0s are dropped.

Can we fix this by making sure it's stored as a string and adding a 0 if it's only four digits?

agude commented 3 years ago

Additional info! There are official state county codes here: https://notary.cdn.sos.ca.gov/forms/notary-county-codes.pdf (Uploaded in case it disappears: notary-county-codes.pdf)

AND there are county city codes here: https://www.cdtfa.ca.gov/taxes-and-fees/jurisdictioncodes.pdf (Uploaded in case it disappears: jurisdictioncodes.pdf)

I think these are the right ones! Let's add a county and city column using these!

agude commented 3 years ago

Here is a nice Python dictionary of the county codes:

county_codes = {
    '01': 'ALAMEDA',
    '02': 'ALPINE',
    '03': 'AMADOR',
    '04': 'BUTTE',
    '05': 'CALAVERAS',
    '06': 'COLUSA',
    '07': 'CONTRA COSTA',
    '08': 'DEL NORTE',
    '09': 'EL DORADO',
    '10': 'FRESNO',
    '11': 'GLENN',
    '12': 'HUMBOLDT',
    '13': 'IMPERIAL',
    '14': 'INYO',
    '15': 'KERN',
    '16': 'KINGS',
    '17': 'LAKE',
    '18': 'LASSEN',
    '19': 'LOS ANGELES',
    '20': 'MADERA',
    '21': 'MARIN',
    '22': 'MARIPOSA',
    '23': 'MENDOCINO',
    '24': 'MERCED',
    '25': 'MODOC',
    '26': 'MONO',
    '27': 'MONTEREY',
    '28': 'NAPA',
    '29': 'NEVADA',
    '30': 'ORANGE',
    '31': 'PLACER',
    '32': 'PLUMAS',
    '33': 'RIVERSIDE',
    '34': 'SACRAMENTO',
    '35': 'SAN BENITO',
    '36': 'SAN BERNARDINO',
    '37': 'SAN DIEGO',
    '38': 'SAN FRANCISCO',
    '39': 'SAN JOAQUIN',
    '40': 'SAN LUIS OBISPO',
    '41': 'SAN MATEO',
    '42': 'SANTA BARBARA',
    '43': 'SANTA CLARA',
    '44': 'SANTA CRUZ',
    '45': 'SHASTA',
    '46': 'SIERRA',
    '47': 'SISKIYOU',
    '48': 'SOLANO',
    '49': 'SONOMA',
    '50': 'STANISLAUS',
    '51': 'SUTTER',
    '52': 'TEHAMA',
    '53': 'TRINITY',
    '54': 'TULARE',
    '55': 'TUOLUMNE',
    '56': 'VENTURA',
    '57': 'YOLO',
    '58': 'YUBA',
}

From https://www.kaggle.com/mioszdyka/county-city-location-analysis