Weekly Updates for Week of 2023-01-23 to 2023-01-27

CDP

Spent some time trying to push Whisper over the finish line. It is too good to not just wrap up. I will deploy it on GCP first.
Also trying to optimize the pipeline in general to be as cost effective as possible.
https://pages.awscloud.com/aws-cloud-credit-for-research.html

https://github.com/PugetSoundClinic-PIT/ProjectTracking/issues/60
- we talked about this last week, moved on to analysis for now
- pipelines take a while to run on analysis tho so planning to write while pipelines run
https://github.com/PugetSoundClinic-PIT/ProjectTracking/issues/61
- looked through a lot of viz from HathiTrust + Bookworm and Google Ngram paper and such. All of them use "rolling averages" instead of scatter plots for counts over time -- made a pipeline to do just that
- interestingly, we have reached the point in which we have a decent amount of data that i needed to make optimizations for processing (better parallelism / literally couldn't fit everything I wanted into memory)
- there are very real gaps in our data -- both for the sake of better data quality, removing the gaps in our dataset, making our dataset just larger, providing better data to the undergrads, i propose backfilling a decent amount of events once whisper is released. We can use the open collective funding.
- paper and dataset work here: https://github.com/PugetSoundClinic-PIT/councils-in-action

https://github.com/PugetSoundClinic-PIT/ProjectTracking/issues/71
https://github.com/PugetSoundClinic-PIT/ProjectTracking/issues/94
Late breaking due Feb 12. -- Timeline scheduled for finishing a week prior, but we have some give to timeline!
https://docs.google.com/document/d/15p3L33MjRstuEOt8qUhH_ig4Q8aAJS8sUHeTeO2cTU0/edit?usp=sharing
https://github.com/si2-urssi/eager/blob/main/eda.ipynb
- applied the semantic-logit model across the dataset and reran the same plots -- there are differences in the year over year change in percent software. semantic has software production relatively stable over time while the tfidf logit model has the drop in software produce -- reminder: i attributed that drop to the growth of neural nets

Messaged Tina about CDP-SIG -- she said campaign access eval may be more her vibe
Planning a field trip! -- https://www.when2meet.com/?18440351-QWapf
- Looks like Wed, Feb 1 or Wed, Feb 8 -- any preference if you want to join?
Reading week

Speakerbox JOSS -- still no update
lab trivia on Wed, Feb 1 or Wed, Feb 8
dustin dwyer from minutes wants to meet (i can handle -- i think its mostly engineering? but it is with a lab)
big local news kickoff event for "AgendaWatch"
Giving a presentation to Brian Keegan's class either Feb 21 or Feb 23! 🎉