creativecommons / quantifying

quantify the size and diversity of the commons--the collection of works that are openly licensed or in the public domain
MIT License
24 stars 34 forks source link

[Feature] Post-GSoC ’24: Flesh Out Data and Report Details #126

Open naishasinha opened 3 months ago

naishasinha commented 3 months ago

Context

Automating Quantifying the Commons was a project endeavor for the Google Summer of Code 2024 program, in which a baseline automation software for data gathering, processing, and analysis was successfully developed. However given the time and resource constraints that we had to consider, there are still addressable endeavors to improve this codebase over the upcoming quarters and years. This is the third (3) of five (5) issues raised specifically for post-GSoC contributions.

Problem

The current API fetching capacity limits the level of detail that can be incorporated into the data analysis, restricting the insights that can be drawn. As an example, the Google Custom Search data source only fetches data by license, country, and language, given the API restrictions.

Description

Once more data is collected, this feature will involve expanding the data features to incorporate more detailed analysis. The approach should draw on pre-automation work from the 2022 Data Discovery Program.

NOTE: since contributing to this specific issue is limited by access to API data fetching and the fact that the solution is long-term, this issue is being set as a discussion for all open-source developers to be able to pitch their ideas for final implementation by the developer(s) who work on the codebase.

Implementation