commercetest / nlnet

Analysis of the opensource codebases of NLnet sponsored projects.
MIT License
0 stars 0 forks source link

Create sankey diagram starting from loading the list of NLnet repos until we get as many test counts as practical #56

Closed julianharty closed 2 months ago

julianharty commented 2 months ago

Context

This extends the work in #53, and is the first of our visual reports.

To implement this we may want/need to revise how and where the various interim results are written e.g. for duplicate repos and those that have incomplete URLs.

Location of the report

Let's generate/save the sankey figures in https://github.com/commercetest/nlnet/tree/main/reports/graphs/sankey-diagram-of-analysis. For now, we can overwrite any existing report in that location (which is what the pytest report does) and commit updates to the repo as git will preserve the history of the updates. We may choose to consider alternatives once we get the reports working.

tnzmnjm commented 2 months ago
julianharty commented 2 months ago

image

Source: https://excalidraw.com/#json=2Tm8zQNYoMsjyprmWu54A,1xL5i112-bvprQ5RMm4WCw or https://excalidraw.com/?#json=WOhT64qVRYw4u3OXHKKIV,lKwcdN0p7xq6-DDW5fmnMg

The above figure provides some ideas on how we might structure the columns of the sankey diagram. It illustrates how the test pass rate might be obtained for each project. It doesn't go into any detail about how we'd actually make this work as that's a significant and distinct topic by itself.

So far we've managed to get elements of the data to be able to provide counts of filenames that include test somewhere in the filename; level 2a in this figure. I believe we can also detect test runner scripts fairly easily - level 2b in this figure. There are likely to be gaps in what we detect automatically and we may want to invest time in reducing gaps to a practical minimum.

julianharty commented 2 months ago

Some additional notes on Sankey diagrams that might be helpful if/when we refine the diagram:

tnzmnjm commented 2 months ago

Working on the section 2b of the diagram --> Detecting test runners

Detecting Test Runners in a Repository: