commercetest / nlnet

Analysis of the opensource codebases of NLnet sponsored projects.
MIT License
0 stars 0 forks source link

Improve the initial data preparation and reporting #53

Closed julianharty closed 2 months ago

julianharty commented 2 months ago

Context

Currently we have three scripts connected by intermediate data files. Some processing is duplicated in the code that queries repos remotely on github.com and the code that clones repos locally before performing local analysis.

We also don't provide much reporting of entries that lack data or that are no longer available at the specified URL.

Code such as:

julianharty commented 2 months ago

In terms of reporting; I'd like to experiment with both data-reporting and visual-reporting.

Data reporting would include RDF files, probably in turtle format, and formats that are easy to process further e.g. as Dataframes and/or in online services such as Big Query.

Visual reporting could include graphs, plots, and especially Sankey diagrams: https://python-graph-gallery.com/sankey-diagram/ which may help to make data quality issues easy to spot; and also - for repos that are successfully queried - groupings of results such as the count of test files, tests, etc.

tnzmnjm commented 2 months ago
tnzmnjm commented 2 months ago

Progress Update