Closed caldwellst closed 12 hours ago
I'll just take this branch on. I may leave some comments on the code from seth mainly just as notes for myself
@martinig94 -- I added a GHA here: https://github.com/OCHA-DAP/hdx-signals/actions/workflows/user_audience_analysis.yml.
csv
on the dev
blob. Once this is merged to main it should run every Monday at 8AM. workflow_dispatch
(HS_LOCAL= FALSE
).src/email/mailchimp/audience_analysis.R
file.
src/email/mailchimp/create_user_analytics_dataset.R
, the remaining code is mostly to make the plots. adhoc
folder in root and put this script there to avoid extra time on updating these plots in the future. We can always restructure and delete the folder at a later pointHey @zackarno,
Apologies for the late reply, I missed your comment. Yes I agree in creating an adhoc
folder for analysis that needs to be run once in a while without removing the file from the repo.
Hey @zackarno, Apologies for the late reply, I missed your comment. Yes I agree in creating an
adhoc
folder for analysis that needs to be run once in a while without removing the file from the repo
no worries, i've added the folder and changed the PR from draft to "ready for review"
I was thinking about how to best log changes to database. I was going to just read in the new data set and compare to old. If anything new in old append it on. However, I noticed an example where the user changed iso2
. Wonder how this should be reflected. Since we don't really know how this data will be used i think we should just write out a new file (csv) 1x per week and from that we can always retroactively merge when we understand exactly what's needed?
I was thinking about how to best log changes to database. I was going to just read in the new data set and compare to old. If anything new in old append it on. However, I noticed an example where the user changed
iso2
. Wonder how this should be reflected. Since we don't really know how this data will be used i think we should just write out a new file (csv) 1x per week and from that we can always retroactively merge when we understand exactly what's needed?
Hey Zack,
yes that's definitely an option, otherwise you could just append the dataframe with only the rows changed from the previous version adding a column extraction_date
so everything is already merged in one unique file and the dataset can be filtered as the need arises without creating too many new rows at every iteration. So you would have the exact same rows if there were no changes between two weeks and you would have one row more if user X modified one of the interests and the date associated would be the date in which the scripts run.
I think I would prefer this option compared to having a new file generated every week, but both options are good to me!
Code used to do the audience analysis. I think you can just get rid of the plotting work (maybe just save in the Google Drive for posterity). Then just understand what the data frames generated are and do, and save those to the Azure cloud in a new folder, maybe something like
audience
. Shouldn't be very hard, and then all you need is to setup a weekly analysis on the Monday or something.