For each submission, SoftwareX forks th software author's original repository. This prevents fetching the lates SBOM, and hides all software production traces. The attached dataset provides link between SoftwareX repo and parent (original repository of the software creator). There are ~850 repos in this dataframe.
Future work mining the acknowledgement statements might reveal direct connection to funding source. mentioned collaborators but not authors, etc.
Includes:
imp2.py
- script to fetch repositories and their parent repository from SoftwareX via Github APIv3. Returns a CSV.forked_repos-origins.csv
- dataframe with a SoftwareX repository, and the repository it forkedTLDR: