PugetSoundClinic-PIT / 2022-Election-Material

Fork of @BlakeRMills repo for Web scraping, candidate data, and election visualizations - to modify for governors and mayors races 🤞
0 stars 0 forks source link

Generating accessability dataset for each candidate's website #4

Open nniiicc opened 1 year ago

nniiicc commented 1 year ago

At https://github.com/BITS-Research/campaign-access-eval Follow the recreating analysis to see the results of a 2021 project...

This served as proof of concept. The next step is to generate new accessibility data to replicate this analysis.

  1. Follow the directions here: https://github.com/BITS-Research/campaign-access-eval/blob/main/CONTRIBUTING.md to set up your machine.

  2. Using the make generate-report create a report for the 2022 elections (across all races).

  3. Then - re-run analysis and using the 2022 data (following directions here https://github.com/BITS-Research/campaign-access-eval#recreating-analysis) and we can discuss the outcomes and any new analysis we should do (e.g. analyzing data by race, geography, etc.)

peiwenf commented 1 year ago

Hi professor Weber @nniiicc , I have some questions about this issue I have got the generate-report GitHub action run but ran into some special cases https://www.alabamaag.gov/About https://coag.gov/about-us/colorado-attorney-general/ https://portal.ct.gov/AG/About-the-AG/William-Tong-Biography-page The generate-report function only accepts the website base and would throw an error if run into the above examples I have checked the website base for above websites and they don't look like campaign pages. So I was wondering if I can just exclude them from the data frame

nniiicc commented 1 year ago

@peiwenf - good catch. Yes, please remove these from the analysis. These are position websites (a good indication is that they have a .gov domain)

peiwenf commented 1 year ago

Hi professor Weber @nniiicc, Here is a special case that I would like to double-check with you: https://www.tomdevore.com This website references many news articles, and every news link looks like this https://www.tomdevore.com/news-title. And the current function can't parse this much information. Since this is the only website that runs into this problem I was wondering if I can just set it as an invalid weblink.

peiwenf commented 1 year ago

Attorney General failed links: https://michaeltagliaviaforvermont.com/ - ERROR https://www.wendellmajor.com/meet_wendell - canceled https://www.tomdevore.com/ - canceled

peiwenf commented 1 year ago

Hi professor Weber @nniiicc, after our talk today I went to the websites and checked for the information we are lacking. I didn't find the funding information but noticed that all the websites except CityElections have the voting results. Should I go back to grab this information?

nniiicc commented 1 year ago

Yes

peiwenf commented 1 year ago

Hi professor Weber @nniiicc , when I went back to get the voting information I noticed the website has updated. They added the general run off election for the races that none of the candidates in the general election receive a required percentage of the votes. I was wondering for these kinds of races which election would be prefer?

nniiicc commented 1 year ago

I was wondering for these kinds of races which election would be prefer?

I don't understand what you are asking - can you please provide an example

peiwenf commented 1 year ago

Here is an example for that case. https://ballotpedia.org/Anthony_Bradshaw There was a general election, but none of the candidates reached the vote that required to be elected. So the candidates with two highest votes are doing a general run off section. I was wondering which data are we more interested in.

peiwenf commented 1 year ago

Hi professor Weber, I have another clarifying question on this issue, for the retention elections like this one https://ballotpedia.org/Martin_Fallon. The voters only select yes or no. I'm thinking about two methods to handle it

  1. For the result column: if Yes > No mark as Won, if not mark as Lost, and do the analysis along with other races
  2. make them a separate data frame to do a different set of analysis Please let me know your thoughts on this
peiwenf commented 1 year ago

@nniiicc I have another question for City Elections, there are 50 different races in the City_elections data frame ( as the following code shows). I wish to do further data cleaning for a better comparison between races. I was wondering if I can treat all courts(Municipal court, judicial court, circuit court as the same court? And treat the city attorney and state attorney all as just attorneys. I did some research on all these races, and they all seem like separate races, so I was unsure if I can combine some of them together. ['City Assembly', 'Service Area', 'City Council', 'Board of Directors', 'City Attorney', 'Controller', 'Auditor', 'Assessor-recorder', 'District Attorney', 'Public Defender', 'Board of Supervisors', 'Community College Board', 'County Sheriff', 'Assessor', 'Prosecuting attorney', 'Recorder', 'Judicial offices', 'County Attorney', 'City Commission', 'County Clerk', 'County constable', 'County coroner', 'County Judge', 'County surveyor', 'County property valuation administrator', 'County commission', 'Soil and Water Conservation District Supervisors', 'County Constable', 'County Coroner', 'County judge/executive', 'County Commission', 'Louisville metro council', 'District Court', 'Clerk of the Circuit Court', 'Register of Wills', "State's Attorney", 'Board of Aldermen', 'Collector of revenue', 'License collector', 'Recorder of deeds', 'Special district offices', 'City Board of Supervisors', 'Municipal Court', 'County Circuit Court Clerk', 'County Criminal Court Clerk', 'County Juvenile Court Clerk', 'County Public Defender', 'County Register of Deeds', 'County Trustee', 'Municipal court judge']

nniiicc commented 1 year ago

Here is an example for that case. https://ballotpedia.org/Anthony_Bradshaw There was a general election, but none of the candidates reached the vote that required to be elected. So the candidates with two highest votes are doing a general run off section. I was wondering which data are we more interested in.

The only race that we care about is the one where a winner was declared - if a race goes to a runoff you should default to that race

peiwenf commented 1 year ago

@nniiicc Please allow me to double-check if I understood it correctly. If a race goes to a runoff, I should get the runoff data.

nniiicc commented 1 year ago

Yes

nniiicc commented 1 year ago

Hi professor Weber, I have another clarifying question on this issue, for the retention elections like this one https://ballotpedia.org/Martin_Fallon. The voters only select yes or no. I'm thinking about two methods to handle it

1. For the result column: if Yes > No mark as Won, if not mark as Lost, and do the analysis along with other races

2. make them a separate data frame to do a different set of analysis
   Please let me know your thoughts on this

For the result column mark it as a YES if they won - but please not somewhere in documentation that we made this decision so we can report it in a paper

nniiicc commented 1 year ago

there are 50 different races in the City_elections data frame ( as the following code shows)

There are definitley ways to combine these races - but overall we want to be able to report data about races "at the local level" so combining them all into one group is fine.

Some easy combinations of races are

others might be

peiwenf commented 1 year ago

Hi professor Weber @nniiicc , I have a thought on categorizing the municipal and city elections and I would like to double-check this thought with you. I'm thinking about categorizing them into two different groups first (local and state) and then dividing them into the three branches: executive, judicial, and legislative. I was wondering if this is ok, if not I can go down further like breaking down judicial to courts and enforcement.

nniiicc commented 1 year ago

Yes, please feel free to create categories that make sense - if you have questions or want feedback on what races belong where feel free to ask here. A few things though ...

The hierarchy looks like: Federal -> State -> County-> City (where county and city are considered "local" or "municipal" government)

So in dividing across these "branches" of government we have elections at each level, including some break down like the following:

State Government

Local Government

peiwenf commented 1 year ago

Yes, please feel free to create categories that make sense - if you have questions or want feedback on what races belong where feel free to ask here. A few things though ...

The hierarchy looks like: Federal -> State -> County-> City (where county and city are considered "local" or "municipal" government)

So in dividing across these "branches" of government we have elections at each level, including some break down like the following:

State Government

  • Judicial
    • Attorney General
    • Judgeship
  • Legislative
    • Representative
    • State Congress
  • Executive
    • Governor

Local Government

  • City
    • Judicial
      • Attorney General
    • Legislative
      • City Council
  • County
    • Judicial
      • County Judge
    • Legislative
      • County Council / Administrator

Thank you! This is really helpful. So there is no executive branch for the local level?

peiwenf commented 1 year ago

@nniiicc I have another question about the local-level data, For some races like this one https://ballotpedia.org/Carlos_Garcia_(Arizona) they are holding a runoff election in March, so there is no current voting data, I'm thinking about two possible ways to handle it:

Also for some candidates, the website didn't provide the voting information. This one only has info from 2019 https://ballotpedia.org/Dagmar_Mikko. Since they don't have a campaign page also, may I just remove it from the data frame?

nniiicc commented 1 year ago

Thank you! This is really helpful. So there is no executive branch for the local level?

Sorry - there is -its usually a Mayor (but not always) - I just left it off because I was being lazy.

nniiicc commented 1 year ago

For some races like this one https://ballotpedia.org/Carlos_Garcia_(Arizona) they are holding a runoff election in March, so there is no current voting data, I'm thinking about two possible ways to handle it:

I think for races that are continuing we should NOT include any past votes - its just not feasible to continue tracking these.

For the races that don't have voting info (for some other reason) - lets just not collect this data

peiwenf commented 1 year ago

@nniiicc Got it, so for the ongoing races I should just leave the voting information blank for the candidates or just not including them in this data frame?

nniiicc commented 1 year ago

Leave it blank please

On Fri, 17 Feb 2023, 12:19 pm peiwenf, @.***> wrote:

@nniiicc https://github.com/nniiicc Got it, so for the ongoing races I should just leave the voting information blank for the candidates or just not including them in this data frame?

— Reply to this email directly, view it on GitHub https://github.com/PugetSoundClinic-PIT/2022-Election-Material/issues/4#issuecomment-1435221771, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACNR7XQFCQRBYC3HKCMCDDWX7MLPANCNFSM6AAAAAATTKWKAI . You are receiving this because you were mentioned.Message ID: @.*** com>

peiwenf commented 1 year ago

@nniiicc Hi professor Weber, I have some question about the categorization

nniiicc commented 1 year ago

I like these options: