OCHA-DAP / hdx-signals

HDX Signals
https://un-ocha-centre-for-humanitarian.gitbook.io/hdx-signals/
GNU General Public License v3.0
6 stars 0 forks source link

Setup audience analysis #253

Closed caldwellst closed 12 hours ago

caldwellst commented 2 weeks ago

Code used to do the audience analysis. I think you can just get rid of the plotting work (maybe just save in the Google Drive for posterity). Then just understand what the data frames generated are and do, and save those to the Azure cloud in a new folder, maybe something like audience. Shouldn't be very hard, and then all you need is to setup a weekly analysis on the Monday or something.

zackarno commented 2 weeks ago

I'll just take this branch on. I may leave some comments on the code from seth mainly just as notes for myself

zackarno commented 2 weeks ago

@martinig94 -- I added a GHA here: https://github.com/OCHA-DAP/hdx-signals/actions/workflows/user_audience_analysis.yml.

Remaining decisions

martinig94 commented 1 week ago

Hey @zackarno, Apologies for the late reply, I missed your comment. Yes I agree in creating an adhoc folder for analysis that needs to be run once in a while without removing the file from the repo.

zackarno commented 1 week ago

Hey @zackarno, Apologies for the late reply, I missed your comment. Yes I agree in creating an adhoc folder for analysis that needs to be run once in a while without removing the file from the repo

no worries, i've added the folder and changed the PR from draft to "ready for review"

zackarno commented 1 week ago

I was thinking about how to best log changes to database. I was going to just read in the new data set and compare to old. If anything new in old append it on. However, I noticed an example where the user changed iso2. Wonder how this should be reflected. Since we don't really know how this data will be used i think we should just write out a new file (csv) 1x per week and from that we can always retroactively merge when we understand exactly what's needed?

martinig94 commented 1 week ago

I was thinking about how to best log changes to database. I was going to just read in the new data set and compare to old. If anything new in old append it on. However, I noticed an example where the user changed iso2. Wonder how this should be reflected. Since we don't really know how this data will be used i think we should just write out a new file (csv) 1x per week and from that we can always retroactively merge when we understand exactly what's needed?

Hey Zack, yes that's definitely an option, otherwise you could just append the dataframe with only the rows changed from the previous version adding a column extraction_date so everything is already merged in one unique file and the dataset can be filtered as the need arises without creating too many new rows at every iteration. So you would have the exact same rows if there were no changes between two weeks and you would have one row more if user X modified one of the interests and the date associated would be the date in which the scripts run. I think I would prefer this option compared to having a new file generated every week, but both options are good to me!