TWJolly / fundraising_data_pull

Pulls data from the JustGiving API for a defined set of charities and highlights new pages
1 stars 1 forks source link

Long runtime; suggest another version limited to 'date after' #20

Closed daaronr closed 4 years ago

daaronr commented 5 years ago

I ran 'Just_giving_data_pull.R' in the Github 'sponsorship_design_analysis' repo. It now seems to work fine, capturing the fundraisers for correctly-identified charities (see below; especially CARE, Oxfam, WaterAid, Unicef). [Note that I've edited it to capture only those charities that match to the 'justgiving_id' I have checked separately.]

However, the runtime is a lot longer than I recall. It seemed to require 8 hours or more (not sure how to track this, but I had to leave it overnight), taking a particularly long time to process donation_data. If it takes this long each time, it may make our experiment more work to run (we have to set it running every night). I was wondering if the program could be set up to only download fundraisers from after our start date. Perhaps this should be a different version of the program, or we should have this as an argument ('earliest date'). For the purposes of scoping/power analysis it's good we are getting everything, but for the actual day-to-day running of the experiment it would be better to have a faster runtime.

Thoughts? Shall I go ahead and try to program this?

TWJolly commented 5 years ago

That sounds like too long - I think you should be able to filter the fundraisers by date in just_giving_data_pull.R

Can;t remember exactly but the fundraiser_search_data_2018 code block could be filtered even more and then used in the subsequent code .

daaronr commented 4 years ago

issue is no longer relevant but I'm about to post a related one.