georgetown-cset / funder-finder

Retrieve GitHub repo funding information
Apache License 2.0
7 stars 3 forks source link

Numfocus funding data retriever #20

Closed jmelot closed 1 year ago

jmelot commented 1 year ago

Closes #13

github-actions[bot] commented 1 year ago

No need for rebasing :+1: behind_count is 0 ahead_count is 22

github-actions[bot] commented 1 year ago

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
248 170 69% 0% 🟢

New Files

File Coverage Status
funderfinder/sources/numfocus.py 68% 🟢
funderfinder/utils/list_numfocus.py 31% 🟢
tests/sources/test_numfocus.py 100% 🟢
tests/utils/test_list_numfocus.py 100% 🟢
TOTAL 75% 🟢

Modified Files

No covered modified files...

updated for commit: 2cb0cea by action🐍

jmelot commented 1 year ago

I liked your suggestions about how to go about this in #13, and I've mostly implemented them. One deviation is that I did not scrape the small development grants. I believe all the projects that receive these grants are a subset of the affiliated or sponsored projects, so it wouldn't give us any new project, and as I understand it other funding is delivered to the projects outside of these grants (although I could be wrong about that and should ask NumFOCUS), so the numbers might be misleading. I was torn about this because the small development grants do give us cool time-series data...

I retrieved several pieces of metadata for each numfocus project in utils/list_numfocus.py that might give us a way to match a project passed to sources/numfocus.py. One of these pieces of information is the github repo or organization associated with the numfocus project. I attempted to do this matching programmatically but my method is not foolproof, and I manually went through and fixed the mapping in some cases in data/manual_repo_mapping.json.

I did automate the pipeline so that it should run every Sunday. I tested that the pipeline runs successfully by temporarily triggering it on pushes to this branch, and it does seem to work.

I feel like this is a bit awkward, maybe we will come back to this when we have a script that looks for funding information from any of the sources and aggregates the results.

Anyway, see what you think, happy to make changes to make this easier to follow/more useful/etc

jmelot commented 1 year ago

Thanks for your review, good points! I think there's just one unresolved issue, so please merge if you are ok with my response.

jspeed-meyers commented 1 year ago

MERGING. Nice!