Daily time tracker - Githubissues

nniiicc commented 1 year ago

Please leave a comment here with what issue you worked on, and your progress. Record the number of hours you spent on the issue. Also include what you plan to work on / accomplish tomorrow.

peiwenf commented 1 year ago

Today (Total of 4h) Work on Issue #4: Set up the machine and tested the "make report function" by making reports locally (1.5h) Asked Eva about the basic information about the project structure (0.5h) Read through the files in GitHub (1.5h) Work on Issue #3: Read through the methodology of The Markup (0.5h) Tomorrow: Try to schedule a time with Eva to go over the project, still confused after reading the files. Start generating reports (Might need to change the script to fit our goal before doing this)

peiwenf commented 1 year ago

Sorry I forgot to track my work yesterday, here is a tracker for both 1.10 and 1.11: 1.10 (Total of 4 hours) issue#4 Meet with Eva to learn about last year's project (1h ) run the generate report function in a loop locally (2.5h) I figured that my computer can't handle the work locally, so I will try to run the git actions locally issue#3 tested project with gui, studied blacklight-collector repo, learned about npm (0.5h) 1.11 (Total of 2 hours, will make up tmr!) issue#4 Setup GitHub CLI on my computer, write the script for running the GitHub action locally, studying the download function (1h) issue#3 Did more research on npm, ran the blacklight-collector locally, and tested with a website (1h) Plan for tomorrow : get admin access to the repo run the GitHub action locally get the reports for attorney.csv down

peiwenf commented 1 year ago

issue #4 (2h) Runs the GitHub actions locally and got the reports for the majority of the candidates in Attorney General Races Tomorrow Figure out the way on clean the data frames (depends on the answer to the question about issue#4) get the analysis down for Attorney Generals

peiwenf commented 1 year ago

issue #4 (4h)

Cleaned the Attorney Generals' data frame
Added the election result to the data frame
Went through the failed GitHub Actions to see what went wrong with the ones that had the right campaign link
Finds a method to delete all the workflows to use with the download method (GitHub Rest Api)
Re run the cleaned data frame to test

Tomorrow

Fix the GitHub Action errors
Get the analysis down for Attorney Generals

peiwenf commented 1 year ago

Issue#4(4h)

Further improved the condition clause for data cleaning
Studied how GitHub Action works through internet research
Read through all the related files
Find a possible solution to one of the errors

Tomorrow:

Understand what is the generate-report doing
Debug the second error
Decide on if keep fixing the bug or ignore these cases

peiwenf commented 1 year ago

Issue#4 (4h)

Understood what is the generate-report doing
Fixed error one on the generate-report function
Partially fixed error 2

Tomorrow:

Fix error 2
Start to analysis the attorney general

peiwenf commented 1 year ago

issue #4 (2h)

Fixing error 2

Documentation(2h)

IX courses
tax form

Tomorrow

Fix error two
Get the Mayor reports, to check if some cases in error 2 can be ignored

peiwenf commented 1 year ago

Issue #4 and Issue #5 (4h)

Fixed majority of error 2
- stuck on finishing debugging, will move on to Mayor to see if get more insights
Come up with some analysis ideas
Cleaned Mayoral data frame

peiwenf commented 1 year ago

1/25 & 1/26 Issue #4 (8h)

Generate reports for Mayor, operation cancel problem shows up again
recreated the 2021 analysis with the old data
Cleaned the local elections data frame, figured that the result hasn't updated yet, will put a hold on this election
tries to debug the operation cancel problem by ignoring the upcoming event page, will see the result tomorrow

Tomorrow

Analyze attorney general
Figure out the operation cancel problem

peiwenf commented 1 year ago

1/27 ISSUE #4 (4h)

modified constant.py to run the new data
modified the format of the stored folders
modified core.py to run the new data

peiwenf commented 1 year ago

ISSUE #4 (4h) Things tried:

Tries to control the crawl depth for spider by modifying the middlewares file, but didn't work. Then realized an easier way of changing the setting of the Crawl Spider following the dictionary. It failed again, so I have changed the location of calling setting function and waiting for the results now Things have been done:
Cleaned the rest of the elections' data frame and modified the folder for convenience in analysis

peiwenf commented 1 year ago

1/31 & 2/1 Issue #4 (8h) Things tried: tried to import the modules from other folders but failed, temporarily put all the files in a same folder worked for testing, and will solve the path problem tomorrow Things done: fixed generate report wrote the script to get the voting information combined the axe-scraped data with the original data Tomorrow: reinstall the repo to make the path work finish analyzing for Attorney General and start on another race

peiwenf commented 1 year ago

2/2 Issue #4 (6h) Things tried: Things done: recreated an environment to make the relative path work regenerated the data frame to add the word matrix debugged the analysis script till line 765, most of the bugs are caused by empty values from the candidates who don't have a campaign page Tomorrow: planning on creating two datasets. One contains all the candidates who don't have a campaign page, one contains only the candidates that have a campaign page for the convenience of analysis Finish analyzing for attorney general and finish house elections

peiwenf commented 1 year ago

2/3 Issue #4 (7h) Things done: Created two datasets Got the analysis for the attorney general data frame and fixed all the non-scene data line NaN values or unnecessary plots Tomorrow: Ask about the next step Replicate analysis for the other races

peiwenf commented 1 year ago

2.6 and 2.7 Issue #4 (11h)

Double-checked all the data frames
Noticed updates for House information ( Because the data was scraped too early, so it missed lots of later updated information) Rescraped House data
Got the Voting information for House and Governor, sorry this took around 5 hours, the pages have many variations so I had to identify and handle different cases
Generated reports for House and Governor

Tomorrow: Plan to work at least 7 hours

Finish analysis for House and Governor
Get voting results for City Elections and Municipal Elections, these two has a different format, so I have to write a different script to get the voting information.
If my current idea of getting voting information work for the second bullet. I will try to finish getting the analysis down for all data frames, but generating the data frame for analysis takes a long time and my computer will be really slow when I do that. If I can't get it down by tomorrow I will finish

peiwenf commented 1 year ago

2.8 Issue #4 (4h+4h running code) Wrote a 1st draft of getting voting data with the logic of getting 1 race and then moving to the next one, then realized the special case of retention elections (asked in the issue) and figured the city election is not well formmated as other datasets, so I might need to handle each candidate individually Reduced the unique race for city elections from 70 to 50 aiming to reduce more after hearing back from the issue Generated the study data for Governors which took 4.5 hours Things need help with @nniiicc : Combining axe data with the original data for Governor took more than 4 hours, and this is not the biggest data frame. When the terminal is running, my computer is getting really slow so I can't multi-task on other tasks. The only solution I can think of is to divide the largest data frame into 5 pieces, and then concatenate all parts together. I was wondering if there are any other suggestions. Tomorrow: Finish analysis for Governor Prepare the report generated by House for analysis Clean the City Elections and Municipal Elections' race column Get voting results for City Elections and Municipal Elections

nniiicc commented 1 year ago

@peiwenf

Combining axe data with the original data for Governor took more than 4 hours, and this is not the biggest data frame.

How are you combining the data? Have you tried using Google Collab where you get an extra GPU?

peiwenf commented 1 year ago

@nniiicc I combined the data by locally running this script https://github.com/peiwenf/campaign-access-eval/blob/2022dev/access_eval/bin/generate_access_eval_2022_dataset.py. I will look up Google Collab!

peiwenf commented 1 year ago

2/12 Issue #4 (5h)

Finished analysis for Governor
Prepared House reports for analysis
Learned about google Colab, and started to set it up. Connected it to google drive and will try to create a Conda env in it tmr
Got the voting results for Mayor

Things need help with: @nniiicc When I try to push the house reports to GitHub, the system warned me about the file is larger than the 100 MB limit(it's around 286MB), so I tried to solve it by setting up the large file system (LFS). And when I run it with the lfs, I received the following message. I was unsure if I should purchase a data plan.

(bits) CicideMacBook-Pro:campaign-access-eval fpw$ git push
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Plan for today

Run House analysis through google Colab
Clean the City Elections and Municipal Elections' race column
Get voting results for City Elections and Municipal Elections
Run Mayor analysis

nniiicc commented 1 year ago

@peiwenf - why not just split the reports up into three batches - so that they can be run with the action, and then recombine them after the action is complete?

peiwenf commented 1 year ago

I have ran the action locally, I was just unsure if we want a copy of this data in the Github.

nniiicc commented 1 year ago

Got it -we should duplicate the storage somewhere - for now you can just split it into 99mb junks (for example house-1 and house-2)

peiwenf commented 1 year ago

2/13 &2/14 (8h) Issue #4

Set up the google Colab for analysis. Had to create a Conda environment and download homebrew, geckodriver, firefox, so it took a while, now the GPU says it reaches the limit so I have to wait for one day until it refreshes.
Read through the white house webpage once again, and have a plan for categorizing the data for further analysis. I have put my thoughts under Issue #4 for approval.
Tried to scrape the city elections by going through each candidate's website and finding its own voting information. But after testing it out I find this method is not efficient. So now I will try to get the voting information for people from one race together and then match their names with their voting information.

Tomorrow

Try to debug the logging problem of google Colab
Finish getting the voting information for Municipal and City elections

peiwenf commented 1 year ago

2/15 & 2/16 (8h) Issue #4

Tried to use google Colab for analysis, but got stuck on logging and the use of tqdm. Aftering trying to run a smaller set of data for testing and consulting Eva, I realized that the original code doesn't have any library that can be used for the GPU, so google Colab would be slower than my own computer. Therefore, I would just divide up the data frame and call them locally.
Wrote the general script for getting City election results, because the page I scraped is separate from the page that contains the voting information, so I had to figure out a way to link the two. Now still need to handle the runoff races.

Today

Finish getting the voting information for Municipal and City elections
Finish analysis the Mayor

peiwenf commented 1 year ago

Issue #4 2/17(6h) & 2/20(6h)

Got the voting information for City elections, Municipal election in progress, need to check on the result
- Wrote four if statements to handle the four different layouts
- Handled the special case of more than 1 winner for the races that selected more than 1 seat
Generated reports for the City Elections
Finished coming axe results with the cleaned data for Mayor and House
- House took about 20 hours to finish combining the data

Today

Finsih getting the the municipal election voting information
Finsih analysis for the City elections/Mayor/House
Get reports for the municipal elections and start its analysis
Finish categorizing the City/ municipal/local races

peiwenf commented 1 year ago

Issue #4 2/20 (6h)

Finished getting the municipal election voting information
Finished analysis for the Mayor/House
Getting reports for the municipal elections (still in progress)
Finished coming axe results with the cleaned data for City Elections(took around 11h)
Tried to Categorize the City elections but run into some questions

Things need help with

Not sure if Attorney General should belong to the Judicial or Executive branch
Tried to remove the large file from the commit by the following commands
- git reset --soft HEAD~1 and git reset HEAD access_eval/analysis/reports_2022/House.zip but got remote: error: File access_eval/analysis/reports_2022/House.zip is 237.46 MB; this exceeds GitHub's file size limit of 100.00 MB in return

Today

Finish analysis for City and municipal elections
Finish the categorization for City/Municipal

peiwenf commented 1 year ago

issue #4 2/22, 2/23, 2/24 (12h)

Got reports for the local elections
Finished coming axe results with the cleaned data for local Elections
Categorize all the races except for the unsure races(asked in issue #4)

Tomorrow:

Finish Categorize
Finish the presentation

peiwenf commented 1 year ago

issue #4 2/28, 3/1, 3/2 (12h)

Finished Categorization for all races
Prepared presentation on current process
Performed research on black light collector and update findings on issue #3
Uploaded data on to the lab drop box Tomorrow:
Work on Github action or parse json if we decided to go this approach
Do research on the accessibility score? Need to check

peiwenf commented 1 year ago

Issue #3 3/3 (5h)

Parsed the testing JSON and got the count for different tracking types
Studied on GitHub Action and wrote a rough draft

Tomorrow:

Finish Github action

Question @nniiicc :

Are we doing the GitHub Action + parse JSON approach?
If so, what other information would we need from this JSON file? Probably the third party companies that tracks the webpage? https://github.com/peiwenf/campaign-access-eval/blob/main/inspection.json

peiwenf commented 1 year ago

Issue #3 (4.5h)

Get the GitHub action running!
Planed out the data combination process

Tomorrow:

Get a clear goal about what data to combine, finish the combination

peiwenf commented 1 year ago

Issue #3 (16h) 3.7, 3.8, 3.9, 3.10

check on what data to get from the black light collector data frame
wrote the script to count the occurrence of different tracker types and combine that with the original data
use the script to generate the black light collector data frame, noticed expired websites and cases where the black light collector doesn't generate a proper result
checked on the special cases and decided to handle the cases manually after realizing the special case can't be handled all together
wrote a wiki page for the data frames generated, which includes the description of the data and data dictionary

Today

finish generate the black light data frames
work on issue #7

peiwenf commented 1 year ago

Issue #3 (4.5h)

finished generating the black light data frames
- handled the special case of expired data
- listed the website that can't generate a report in wiki
read the first two articles in Issue #7 and some relevant articles referenced in them

Tomorrow:

finish reading the other articles in issue #7, get a more clear idea about next step
follow up on learning sentiment test from Eva

peiwenf commented 1 year ago

Issue #7 (15h) 3.14, 3.15, 3.16

Reading the suggested articles
Literature research and review on the related topics
All progress are updated in issue#7

Tomorrow:

Take progress based on decisions on how to get axe score
contact Eva about sentiment test

peiwenf commented 1 year ago

Issue #7 3.20, 2.21 (12h)

Walked through the sentiment test script with Eva
Wrote the script to calculate the axe-score, steps are recorded in issue #7
Get the axe-score for all 7 races and handled special case manually
- special case: no full-axe-results.json in the main folder

Tomorrow:

Get statistical reports for axe-score
Look at the sentiment dataset

peiwenf commented 1 year ago

3.22, 3.23, 3.24 (17h)

Got statistical reports for axe-score, blacklight, and sentiment dataset
- https://docs.google.com/presentation/d/1KTA-XxMXKH_nO7Vabd8nXeJtRWGOu0vjSqHgch6kguU/edit?usp=sharing
Ran in to a similar problem as listed here @nniiicc https://stackoverflow.com/questions/71773608/sessionnotcreatederror-session-not-created-this-version-of-chromedriver-only-s
- Tried all given solutions, the brew worked but don't really understand the reason behind it

Plan for the week of 4.3 (Not in town for the next week)

Schedule a meeting to talk about next step

peiwenf commented 1 year ago

4.3 (2h)

adjusted the scale for the plots
updated the combined data frame to the dropbox

peiwenf commented 1 year ago

4.5, 4.6, 4.7 (14h)

Discussed the next step of the paper
Finished the 1st draft of the assigned background and method section

peiwenf commented 1 year ago

4.10, 4.11, 4.12 (12h)

Reprocessed data by doing the following
- Removed the top and bottom 5 percent of the data based on the value of the vote share
- Added Z score for the axe-core dataset
- Added a column called competitiveness according to the formula:
- abs(axe_score['vote_share'] - 0.5)
- Recreated all the plots and added distribution plots for the new data
- Added the ease of reading to the analysis
- Fixed the ease of reading data by removing the values above 121.22
- Categorized the ease of reading data into 5 categories based on their corresponding school level. Posted the analysis and plots in slack
- Created summary table for the paper
- Calculated correlations between competitiveness and the count of different trackers, put the results in the outline of the paper
Tomorrow:
- Do more plots for the ease of reading based on feedback
- Redo the correlation after removing the outliers
- Do more analysis on the data and report them in the paper outline

peiwenf commented 1 year ago

4.13 - 4.19 (20h)

Plotted the ease of reading data
Done the first round of data analysis on the data
- Created summary table for each section
- Applied ANOVA and Pearson method to all sections and draw interpretation from it
- Wrote definition in for all the variables created in the analysis
- Update progress and get feedback

This week:

Do normality test on the data with the Shapiro–Wilk test
Give more background information on the data
Put the information in table format(re-organizing)
Do literature review for the common errors
find the number of total google trackers
figure out the 15 errors
Color code/bold the important info

peiwenf commented 1 year ago

4.20 - 4.27 Finished the analysis on the dataframe:

Used Shapiro–Wilk test to test the normality
Used Kruskal Wallis test and Dunn's test to find the difference between different categorical variables
Added summary table for all the data, including max, min and different percentiles
Organized all the data in to the table format in a separate file
fixed the non-sense data
Color coded the important info

Today: find the number of total google trackers update the blacklight repo update the plots in drop box Do literature review for the common errors

peiwenf commented 1 year ago

4.28 - 5.2 (12h) Finished all the analysis parts and organized all the scripts

Transferred blacklight repo
- find the number of google trackers
- separated the google analytics from the google trackers
- separate the facebook trackers for comparison
- made new plots based on the new information
Merged the campaign-access-eval repo
- updated all the data and plots to dropbox
- updated all the scripts from getting vote info to analysis

peiwenf commented 1 year ago

5.3 -5.9 (16h)

Finished scraping the library page
Modified the Git Action to make it work on the library webpages
- Many of the library webpage has "homepage/library" format, which will cause problem to creating and storing report. After checking the uniqueness of the base web I have changed the folder name to the base webs while still generating report for the library page.
- Got the report for library homepages
Tomorrow:
- Get the report for catalog pages
- Combine them to the data after counting the number of different trackers and do a comparison between the blacklight data and the website data
- Try out the scrap AI

peiwenf commented 1 year ago

5.10-5.12 (16h)

Got the reports for catalog pages
Combined the reports and the original data
Did analysis on the data and recorded the results in the powerpoint:
- https://docs.google.com/presentation/d/1jw9qwDRUUre6uCHusR-Vl02r8GuJbEvTCwZSB3bI7Ok/edit?usp=sharing

peiwenf commented 1 year ago

5.15-5.18 (12h)

scraped all the public library information across all the states
getting the reports for the homepage links
something to note
- there are more than 8000 rows of data in the data frame, which will take a long time to generate individual reports
- the work might exceeds the capacity of the github action
Tomorrow
Hopefully get all reports for the homepage and start doing some analysis

WeberLab-UW / 2022-Election-Material

Daily time tracker #6