Open nniiicc opened 1 year ago
Today (Total of 4h) Work on Issue #4: Set up the machine and tested the "make report function" by making reports locally (1.5h) Asked Eva about the basic information about the project structure (0.5h) Read through the files in GitHub (1.5h) Work on Issue #3: Read through the methodology of The Markup (0.5h) Tomorrow: Try to schedule a time with Eva to go over the project, still confused after reading the files. Start generating reports (Might need to change the script to fit our goal before doing this)
Sorry I forgot to track my work yesterday, here is a tracker for both 1.10 and 1.11: 1.10 (Total of 4 hours) issue#4 Meet with Eva to learn about last year's project (1h ) run the generate report function in a loop locally (2.5h) I figured that my computer can't handle the work locally, so I will try to run the git actions locally issue#3 tested project with gui, studied blacklight-collector repo, learned about npm (0.5h) 1.11 (Total of 2 hours, will make up tmr!) issue#4 Setup GitHub CLI on my computer, write the script for running the GitHub action locally, studying the download function (1h) issue#3 Did more research on npm, ran the blacklight-collector locally, and tested with a website (1h) Plan for tomorrow : get admin access to the repo run the GitHub action locally get the reports for attorney.csv down
issue #4 (2h) Runs the GitHub actions locally and got the reports for the majority of the candidates in Attorney General Races Tomorrow Figure out the way on clean the data frames (depends on the answer to the question about issue#4) get the analysis down for Attorney Generals
issue #4 (4h)
Tomorrow
Issue#4(4h)
Tomorrow:
Issue#4 (4h)
Tomorrow:
issue #4 (2h)
Documentation(2h)
Tomorrow
Issue #4 and Issue #5 (4h)
1/25 & 1/26 Issue #4 (8h)
Tomorrow
1/27 ISSUE #4 (4h)
ISSUE #4 (4h) Things tried:
1/31 & 2/1 Issue #4 (8h) Things tried: tried to import the modules from other folders but failed, temporarily put all the files in a same folder worked for testing, and will solve the path problem tomorrow Things done: fixed generate report wrote the script to get the voting information combined the axe-scraped data with the original data Tomorrow: reinstall the repo to make the path work finish analyzing for Attorney General and start on another race
2/2 Issue #4 (6h) Things tried: Things done: recreated an environment to make the relative path work regenerated the data frame to add the word matrix debugged the analysis script till line 765, most of the bugs are caused by empty values from the candidates who don't have a campaign page Tomorrow: planning on creating two datasets. One contains all the candidates who don't have a campaign page, one contains only the candidates that have a campaign page for the convenience of analysis Finish analyzing for attorney general and finish house elections
2/3 Issue #4 (7h) Things done: Created two datasets Got the analysis for the attorney general data frame and fixed all the non-scene data line NaN values or unnecessary plots Tomorrow: Ask about the next step Replicate analysis for the other races
2.6 and 2.7 Issue #4 (11h)
Tomorrow: Plan to work at least 7 hours
2.8 Issue #4 (4h+4h running code) Wrote a 1st draft of getting voting data with the logic of getting 1 race and then moving to the next one, then realized the special case of retention elections (asked in the issue) and figured the city election is not well formmated as other datasets, so I might need to handle each candidate individually Reduced the unique race for city elections from 70 to 50 aiming to reduce more after hearing back from the issue Generated the study data for Governors which took 4.5 hours Things need help with @nniiicc : Combining axe data with the original data for Governor took more than 4 hours, and this is not the biggest data frame. When the terminal is running, my computer is getting really slow so I can't multi-task on other tasks. The only solution I can think of is to divide the largest data frame into 5 pieces, and then concatenate all parts together. I was wondering if there are any other suggestions. Tomorrow: Finish analysis for Governor Prepare the report generated by House for analysis Clean the City Elections and Municipal Elections' race column Get voting results for City Elections and Municipal Elections
@peiwenf
Combining axe data with the original data for Governor took more than 4 hours, and this is not the biggest data frame.
How are you combining the data? Have you tried using Google Collab where you get an extra GPU?
@nniiicc I combined the data by locally running this script https://github.com/peiwenf/campaign-access-eval/blob/2022dev/access_eval/bin/generate_access_eval_2022_dataset.py. I will look up Google Collab!
2/12 Issue #4 (5h)
Things need help with: @nniiicc When I try to push the house reports to GitHub, the system warned me about the file is larger than the 100 MB limit(it's around 286MB), so I tried to solve it by setting up the large file system (LFS). And when I run it with the lfs, I received the following message. I was unsure if I should purchase a data plan.
(bits) CicideMacBook-Pro:campaign-access-eval fpw$ git push
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
Plan for today
@peiwenf - why not just split the reports up into three batches - so that they can be run with the action, and then recombine them after the action is complete?
I have ran the action locally, I was just unsure if we want a copy of this data in the Github.
Got it -we should duplicate the storage somewhere - for now you can just split it into 99mb junks (for example house-1
and house-2
)
2/13 &2/14 (8h) Issue #4
Tomorrow
2/15 & 2/16 (8h) Issue #4
Today
Issue #4 2/17(6h) & 2/20(6h)
Today
Issue #4 2/20 (6h)
Things need help with
git reset --soft HEAD~1
and git reset HEAD access_eval/analysis/reports_2022/House.zip
but got remote: error: File access_eval/analysis/reports_2022/House.zip is 237.46 MB; this exceeds GitHub's file size limit of 100.00 MB
in returnToday
issue #4 2/22, 2/23, 2/24 (12h)
Tomorrow:
issue #4 2/28, 3/1, 3/2 (12h)
Issue #3 3/3 (5h)
Tomorrow:
Question @nniiicc :
Issue #3 (4.5h)
Tomorrow:
Issue #3 (16h) 3.7, 3.8, 3.9, 3.10
Today
Issue #3 (4.5h)
Tomorrow:
Issue #7 (15h) 3.14, 3.15, 3.16
Tomorrow:
Issue #7 3.20, 2.21 (12h)
Tomorrow:
3.22, 3.23, 3.24 (17h)
Plan for the week of 4.3 (Not in town for the next week)
4.3 (2h)
4.5, 4.6, 4.7 (14h)
4.10, 4.11, 4.12 (12h)
Reprocessed data by doing the following
Removed the top and bottom 5 percent of the data based on the value of the vote share
Added Z score for the axe-core dataset
Added a column called competitiveness according to the formula:
abs(axe_score['vote_share'] - 0.5)
Recreated all the plots and added distribution plots for the new data
Added the ease of reading to the analysis
Fixed the ease of reading data by removing the values above 121.22
Categorized the ease of reading data into 5 categories based on their corresponding school level. Posted the analysis and plots in slack
Created summary table for the paper
Calculated correlations between competitiveness and the count of different trackers, put the results in the outline of the paper
Tomorrow:
4.13 - 4.19 (20h)
This week:
4.20 - 4.27 Finished the analysis on the dataframe:
Today: find the number of total google trackers update the blacklight repo update the plots in drop box Do literature review for the common errors
4.28 - 5.2 (12h) Finished all the analysis parts and organized all the scripts
Transferred blacklight repo
Merged the campaign-access-eval repo
5.3 -5.9 (16h)
Modified the Git Action to make it work on the library webpages
Tomorrow:
5.10-5.12 (16h)
5.15-5.18 (12h)
something to note
Tomorrow
Please leave a comment here with what issue you worked on, and your progress. Record the number of hours you spent on the issue. Also include what you plan to work on / accomplish tomorrow.