georgetown-cset / funder-finder

Retrieve GitHub repo funding information
Apache License 2.0
7 stars 3 forks source link

Define schema for metadata to be retrieved from each funding source #8

Closed jmelot closed 1 year ago

jmelot commented 1 year ago

I think it would be nice to define what metadata we'd like to collect from each funding source, so we can standardize our scripts' output. Right now the Open Collective script is just retrieving number of contributors and total funding, but we probably want to collect more metadata where it is available.

(I'll come back with more concrete thoughts about what I'd like to see here later, but happy to add in whatever else you think would be useful as well)

jspeed-meyers commented 1 year ago

Number of contributors and total funding are a good start. I think having a type variable would be nice to. Imagine "GitHub sponsors" vs. "open collective. It's essentially the funding source type.

jmelot commented 1 year ago

Agreed, that's a good idea (I'll open a PR for it in a bit if you don't get to it before then!).

I feel like it would be nice to have number of contributors and funding amounts over time as well as the totals when available (we could compare to other metrics like commit frequency, number of active contributors, etc), but I don't think they will be in most cases

jspeed-meyers commented 1 year ago

Perhaps this issue is easiest to address synchronously? We could do a Google meet early in the new year or pair program at a coffee shop.

jmelot commented 1 year ago

Either option sounds good, maybe we can discuss this related issue as well https://github.com/georgetown-cset/funder-finder/issues/24 . Let me know when you're free, Thursdays and Saturdays are very difficult for me to schedule right now but I generally have more availability other days, especially in the afternoons.

jspeed-meyers commented 1 year ago

Hmm, let's do a Friday then. January 6th looks good. I can send you a google meet invite, if that works.

jmelot commented 1 year ago

Sounds good! I'm currently available anytime outside of 10-11 and 1-1:30 on the 6th. Thanks for setting it up!

jspeed-meyers commented 1 year ago

Invite sent!

jspeed-meyers commented 1 year ago

Jennifer and I talked and this is a starting schema:

{
"type": string,
"funding_type": string,
"is_funded": bool,
"num_contributors": int,
"total_funding_usd": int,
"contributors": [{
        "contributor_name": string,
        "amount_received_usd": int,
        "is_affiliated": bool
        }],
"contributions": [{
        "date_contribution_made": YYYY-MM-DD,
        "amount_recieved_usd": int,
        "contributor_name": string,
         }],
"date_of_data_collection": YYYY-MM-DD,
}

Open thread:

jspeed-meyers commented 1 year ago

@jmelot, should we close this issue? We could put this schema in a secondary README only intended for project developers rather than project users. wdyt?

jmelot commented 1 year ago

I think you're right - actually this is already documented here so I think I'll just close it. Thanks for the push to clean up the issue list!