ayyubibrahimi / us-post-data

MIT License
2 stars 1 forks source link

States with processed data #13

Open ayyubibrahimi opened 1 month ago

ayyubibrahimi commented 1 month ago

States with processed/cleaned data:

All processed files should point to this processed directory

stecklow commented 1 month ago

We have new data from Illinois (as of last month) that looks pretty clean to me, and crucially, includes the reason for separation. Can we utilize this new IL file instead of the 2023 one? https://www.dropbox.com/scl/fi/y4rcehqxii8mmfcnjr6k9/EmploymentHistory.csv?rlkey=ieuv3x89ag5fh2pkin16fo2g2&dl=0

ayyubibrahimi commented 1 month ago

IL has been cleaned and added above. Can you add any new data to the unprocessed issue @stecklow? They'll likely always need to be processed, even if minimally.

stecklow commented 1 week ago

I went through the "national-post-db" Dropbox and have some questions and notes about some of the states.

Alaska - hold for launch?

Data is probably too incomplete for launch, not sure how many officers actually have history, too many other questions about immediate usefulness, and the state is now also being difficult with my update request so not even sure we can immediately promise anything

Florida - question

“Separation reason” column is more of an “employment status/change” column - remove “actively employed,” etc., or change name of column?

Georgia - question

“Separation reason” column is more of an “employment status/change” column - remove “actively employed,” etc., or change name of column?

Idaho - hold for launch?

Seems like columns reflect employment changes, and certifications, within an agency - hold for launch?

Illinois - no notes, but seems like a good model for the "reason for separation" column

Indiana - question

“Separation reason” column is more of an “employment status/change” column - remove “active,” or change name of column?

Iowa - question

I didn’t see Iowa in the launch data folder - which is fine, but it’s otherwise on my list, so just not sure what to list it as

Maryland - hold for launch?

The rows appear to show employment status changes, like promotions, often within agencies, rather than between - hold for launch?

North Carolina - hold for launch?

I don’t see any employment history in the index file

New Mexico - hold for launch?

Rows in data appear to show employment status or certification changes within individual departments - honestly really not sure what the index file shows

Oregon - hold for launch?

The rows appear to show employment status changes, like promotions, often within agencies, rather than between - hold for launch?

South Carolina - question

Some of the rows appear to show more employment status changes within departments, though other repeated departments appear to be different stints, which would track with our rule about contiguous service being collapsed

Tennessee - question

No notes, except to ask if things can not be all-caps (this is obviously least urgent)

Utah - question

No notes, except to ask if things can not be all-caps (this is obviously least urgent)

Vermont - hold for launch?

The rows appear to show employment status changes, like promotions, often within agencies, rather than between - hold for launch?

Washington - question

“Separation reason” column is more of an “employment status/change” column - remove “certified,” or change name of column? Also if things could not be all-caps

West Virginia - question

No notes, except to ask if things can not be all-caps (this is obviously least urgent)

Wyoming - question

Not sure “separation reason” is actually showing that - should we change the name of the column?

ayyubibrahimi commented 1 week ago
ayyubibrahimi commented 1 week ago

@stecklow to keep this issue focused just on what data has been processed so far, I just created a new issue here where I've rehashed your list above about what should be included in the launch #19 .

stecklow commented 1 week ago
  • This all makes sense to me. I named them all "separation reason" because of the list in this issue here "Reason for Separation" Breakdown #14. Can you suggest a different name for each of the columns?
  • All of the states that we're going to launch with still need to be normalized for casing, column names, date formatting, etc. This list is just comprised of states that I had to post-process from the BLN index files. Normalizing things according to our schema conventions is still on the list of things to do before launch.
  • For states like Oregon where the rows aren't collapsed, Tarak is still working on his open issue.
  • The Iowa data referenced in the States with unprocessed data  #12 is raw data from Ben that hasn't been cleaned, right?
  • AK and IN were not going to be on the launch list because they only have start_date values. I can remove the others you listed as well.

Thanks Ayyub. I don't know that I see @tarakc02's open issue about row collapsing, so I'm actually not sure if the issues with Idaho, Maryland, New Mexico, and Vermont are in the same boat as Oregon, and maybe should still be here. I made a longer response in #19 with some broader thoughts.

tarakc02 commented 6 days ago

hi all, i just re-opened #6 based on this discussion, but that code is and has been ready to go!

ayyubibrahimi commented 6 days ago

Just added notes to #6 . Thank you!