Closed evilnick closed 1 month ago
@SecondSkoll could you a take a quick look at this one? Especially the part about storing results in a /metrics/ directory. If you don't see any problem with that approach, I'll go ahead and use the same thing for storing spelling check and vale results.
I haven't reviewed this because while it seems perfectly functional, I disagree with the fundamental concept of putting test results into the repository.
I haven't seen the spec outlining the testing, I haven't any context for testing performed by competitors. I don't have full context for what this intends to be - but I believe it is to be able to track these metrics over time. I don't think we will get anything valuable out of that due to how significantly documentation changes over time. Each cycle, I would expect every documentation set to change significantly (content, dashboard commitments, etc.). Those changes would make raw data like this provide unusable data if you tried to do any sort of trend over time with more than two data points. I would personally prefer just an action that runs on a PR and on main, and then you would be able to manually compare results over time (you can set workflow artefacts to last 400 days - you could do that for a project and then go in and document things and provide contextual information that would be needed to actually use this data).
I don't think this is appropriate for spelling tests at all, and I have similar concerns for Vale based style guide tests.
I don't really understand what it is the testing in this PR aims to do, and what problem is it is trying to address (other than maybe the readability score).
storing output in the repo isn't ideal but we currently have nowhere else to do it and are somewhat limited by the commandment that everything has to be able to be run from the makefile :shrug: . Ideally these jobs would run from GH and store elsewhere, but that isn't going to happen any time soon.
There is the option of running jobs as GitHub actions and then storing their results as GitHub artifacts. (I have used that option in the other vale-related PR that is currently open). The artifacts are stored for 90 days by default and can be downloaded anytime before that - using the link that GH provides within the action's results. Will that help?
I think a larger discussion needs to be had about the value of these metrics and how they can/should be used. I understand you need to gather data to be able to process it and synthesise something valuable from it - but from what I can see right now it's likely the data we're gathering will inform the outcome rather than the desired outcome informing the data gathering.
I do think that storing these tests as an artefact would be better than storing them in the repo (and you could do integration tests by comparing to the previous artefacts - and you could also make the pipelines trigger on a schedule so you never lose the most recent ones). But I would also keep in mind that you can go back to any commit and download all the files to run these tests against if you want to.
I don't want to block this, but I feel strongly about the documentation metrics being purposeful - and the purpose of these tests haven't been explained to me (again, other than readability - which I also think shouldn't be Flesch-Kincaid). I also feel strongly that storing these artefacts in the repository is a bad idea.
Thanks for all the input everyone. I will modify the script to not persist data in the repo.
There is definitely room in the future for some discussion about which metrics might be preferred and how they might be analysed.
Update:
Thanks @SecondSkoll . I appreciate your time on this and your thoughtful remarks, which will need to be considered in future iterations.
There's nothing stopping us from swapping in ARI later on, or considering multiple readability metrics. I personally don't have strong views on that and would be interested to hear about the relative merits. ARI seems to only rely on counts of words, sentences and characters so would be easy to implement in bash without invoking Vale at all (if that's what we wanted).
On the Vale readability styles: we were aware of that but because that included several outputs we weren't interested in, and also didn't include some things that we wanted, we opted to use the metrics Vale outputs natively (via ls-metrics). I would have preferred not to use Vale at all because it requires venv and slows the script down. It was the last metric to be implemented and the rest didn't require vale.
In future we should consider what approach might offer us flexibility to adjust the metrics but also best capacity to maintain them.
I'm going to merge this in, as I would like to have a couple of days of code-freeze to update the extension version with all the functionality in main, so we can start using that as a default.
thanks. sorry for the delay in checking in on this, have been ill :'(
/metrics/
directory for metrics scripts andmetrics.yaml
.md
,.rst
) and build (.html
) filesallmetrics
that outputs metrics on the terminal and updates metrics.yamlmetrics.yaml
DOCPR-880 DOCPR-881 DOCPR-882 DOCPR-883 DOCPR-885