looked through a lot of viz from HathiTrust + Bookworm and Google Ngram paper and such. All of them use "rolling averages" instead of scatter plots for counts over time -- made a pipeline to do just that
interestingly, we have reached the point in which we have a decent amount of data that i needed to make optimizations for processing (better parallelism / literally couldn't fit everything I wanted into memory)
there are very real gaps in our data -- both for the sake of better data quality, removing the gaps in our dataset, making our dataset just larger, providing better data to the undergrads, i propose backfilling a decent amount of events once whisper is released. We can use the open collective funding.
applied the semantic-logit model across the dataset and reran the same plots -- there are differences in the year over year change in percent software. semantic has software production relatively stable over time while the tfidf logit model has the drop in software produce -- reminder: i attributed that drop to the growth of neural nets
Teaching
Messaged Tina about CDP-SIG -- she said campaign access eval may be more her vibe
Weekly Updates for Week of 2023-01-23 to 2023-01-27
CDP
JASIST
General Exam
Soft Search / Eager
Teaching
Gig Plat
Other