PugetSoundClinic-PIT / 2022-Election-Material

Fork of @BlakeRMills repo for Web scraping, candidate data, and election visualizations - to modify for governors and mayors races 🤞
0 stars 0 forks source link

Scraping more website data for 2022 elections #2

Closed nniiicc closed 1 year ago

nniiicc commented 1 year ago

The next process is to evolve the scraper to collect more data across more races. Below are data sources and races we want to collect the data from:

  1. Gubenatorial races - see the table here - Each state in the table is a hyperlink that takes you to a page for the governer's race of each state https://ballotpedia.org/Gubernatorial_elections,_2022
  2. Attorney General - see hyperlinks at top of this page https://ballotpedia.org/Attorney_General_elections,_2022
  3. Mayoral election - https://ballotpedia.org/United_States_mayoral_elections,_2022#Mayoral_elections_across_the_United_States
  4. Municipal elections (these include both city and county elections across a variety of offices) https://ballotpedia.org/United_States_municipal_elections,_2022#By_state
peiwenf commented 1 year ago

Hi professor Weber @nniiicc, I have scraped the Gubernatorial races, but I have just realized that the original scrape only got the Twitter handles not the Facebook, Twitter, or Youtube links. I was wondering which kind of information would you prefer to have.

nniiicc commented 1 year ago

Hi - it would be great to copy all of them - but if we just have twitter handles that is perfectly ok. (Don't spend more than 1 hour trying to go back and get the others)

nniiicc commented 1 year ago

Oh - also - Can you check in the scraper code for the Gov races?

peiwenf commented 1 year ago

Oh - also - Can you check in the scraper code for the Gov races?

For sure, I'll push it now.

peiwenf commented 1 year ago

Hi professor Weber @nniiicc . Since Municipal elections contain both information for mayoral and city elections, do you want information for these two elections separately on two data frames or together in one?

nniiicc commented 1 year ago

Good question! Either option is fine. Separate data frames might be easier for analysis, but you could also just add a variable about the race into the data frame (like below)

peiwenf commented 1 year ago

Good question! Either option is fine. Separate data frames might be easier for analysis, but you could also just add a variable about the race into the data frame (like below)

  • Candidate Name, City, Race, Balltopedia link, twitter link....
  • Bruce Harrell, Seattle, Mayor, HTTP://somelink.com,@BigBruce

Got it! I will try to put them in separate data frames. I also have another question @nniiicc . I figured that there are many different races under the municipal and local elections, including county elections, county attorney, county clerk... And these races are different based on the county. I was wondering if we want information from all these races or only the Mayoral and City elections.

nniiicc commented 1 year ago

We want data from all races... This is why it might be easier to just add a variable about which race the candidate is in ... Does that make sense?

peiwenf commented 1 year ago

We want data from all races... This is why it might be easier to just add a variable about which race the candidate is in ... Does that make sense?

Got it! Since different link has different layout, so I'm thinking maybe it's actually easier to treat them separately. My current method would be to have separate data frames for mayoral and city elections while having all the races under municipal elections together in a data frame. Does that sound ok?

nniiicc commented 1 year ago

Sounds great!

peiwenf commented 1 year ago

Hi professor Weber @nniiicc , I have a question about this type of layout. https://ballotpedia.org/City_elections_in_Anchorage,_Alaska_(2022) I was wondering if we need the "office" information in the Anchorage Assembly section. I have figured out how to get the race name and the candidates under each race, but not which seat they are competing for... If we need that information I would need some help.

nniiicc commented 1 year ago

Just the race name is fine - thanks for asking (that's a weird election set-up)

peiwenf commented 1 year ago

Yeah, it's confusing. I just realized the Anchorage Assembly, city attorney, judicial offices, etc. are all part of the city council. And each city would have different offices for election. To make them consistent for analysis, should I just put them as city council? @nniiicc

nniiicc commented 1 year ago

hmmm - I'm not sure I follow. I think Anchorage has a number of races (e.g. assembly, city attorney, etc) ... If you scrape them all and put them all in the same dataframe its perfectly fine - just be sure that candidates are labeled by race

peiwenf commented 1 year ago

Got it! I have another question about the primary and general elections. For city council elections the only time the general election takes place is when 4 people are selected as primary and only three can win. Two of them will fight for one spot. In this case, should I also get the primary information?

nniiicc commented 1 year ago

Hmmm ... this is tricky. I would say that no we should not collect primary data. We can just skip those races where a general election didn't take place. Good catch!

peiwenf commented 1 year ago

Got it, and I have reuploaded the Mayoral result without primary information. The city election has multiple different styles, and I have got most of them covered just need to figure out how to ignore the primary cases now. The municipal election for the county is similar to the city one, so I think it will be quick after I got the city covered. One question about the municipal election for the county, do we want the district information as well? Like which district the candidate is running for.

nniiicc commented 1 year ago

Yes, if we can include district information that would be great.