Open dwillis opened 7 years ago
I think we could do that here and include it as part of the cleaning process.
I agree that would be best, although we won't be able to match all of them.
I was also thinking about writing something that can search members by year and name actually. Do you know if there is any data on staff members?
I've noticed some interesting honorifics in the data set Such as 'fr'. I think it would be useful to put the honorifics into a separate column. Would it be preferable to remove the honorific from the name string or leave it in there and just have it duplicated? I'm strongly in favor of attempting to map to the existing IDs so that it's easy to cross reference with other tables.
@sangxia we do have data on staff members, at least for the past 6 years or so: https://projects.propublica.org/represent/expenditures.
@dwillis That data should be useful, but those before 2009 only exist in paper format? Thinking maybe it's better to start with members of congress first?
@zacherybohon I've never heard of 'fr'. What does it mean? Googled the name of the person with 'fr' but nothing useful it seems.
@sangxia Yeah, paper only prior to 2009. Members are much more important to me.
@zacherybohon in this context, I'm guessing that 'fr' is a reference to a religious figure, since the House and Senate both have official chaplains and they have been known to travel to Italy to accompany trips to meet the Pope.
@dwillis, it means 'Father' as in a priest.
I see. I'll try to find some time to make a first version this week.
In terms of searching members, you can use the Sunlight Congress API (without an API key for now) to search for legislators (example: https://congress.api.sunlightfoundation.com/legislators?query=price&all_legislators=true&state=GA) and then filter to see which result is in the House during the time period of the trip. If it's more than one person, we'll need to try matching on first name or just give up.
I now have some code that matches using lastname + dates using data from https://github.com/unitedstates/congress-legislators . Don't think it's good enough yet because there are some very common lastnames. Will try to see how to use the other parts of the names. They come in all kinds of formats in the travel reports so it'll probably be messy.
PR approved and merged
Hi, all! Just checking in on this one. Sounds like @sangxia is still working on some things so I'll leave it open. Just let me know if I've got that wrong. Otherwise I'll keep an eye on how things are going. Thank you! cc @zacherybohon @dwillis
@restrellado I was thinking it might help if we could take into account committee information when doing the search, but I think that needs more research and I don't have a clear idea how to do that at the moment. I don't think the other things I wrote are going to give much improvement in performance.
I propose closing this one and opening another specifically for adding committee information as part of the search. I can assign @sangxia as the point person for that. Anyone opposed?
Actually on second thought, I read through the whole thread again and it sounds like this is still focused on improving the way we identify names in the dataset. The way the conversation has progressed still seems relevant so we should probably keep it all together. Let's leave it open and see where it goes. Open to other ideas if folks feel different.
Most of the lawmakers in this data are prefaced with "Hon." and it occurred to me that we might be able to match at least some of them to their canonical Congressional Biographical Directory IDs using the ProPublica Congress API or the source of its member data, which is this project.
Thoughts on whether this should be something done in this repository or something I should do downstream?