Closed proycon closed 1 year ago
(By the way, this relates to the wider vocabulary discussion in #32, but I wanted to tackle this separately)
Another thing that came to mind and which I want to add for completion's sake is that when speaking of the development and deployment process, which especially makes sense for software services, it is customary to have the following phases:
Quoting wikipedia:
- The program or component is developed on a Development system. This development environment might have no testing capabilities.
- Once the software developer thinks it is ready, the product is copied to a Test environment, to verify it works as expected. This test environment is supposedly standardized and in close alignment with the target environment.
- If the test is successful, the product is copied to an Acceptance test environment. During the Acceptance test, the customer will test the product in this environment to verify whether it meets their expectations.
- If the customer accepts the product, it is deployed to a Production environment, making it available to all users of the system.
This is typically a cycle that repeats at every release and is surely good practice for us all to adhere to, but I don't see a reason for expressing this explicitly in the software metadata and see more value in the aforementioned repostatus vocabulary. Expressing that something is in a testing/acceptance state can be done by tagging releases as "release candidates" (a suffix to the version number).
In my view these are three different perspectives on assessing status of software. (i) Repostatus is more for developers, (ii) TRL for users, (iii) OTAP voor "moving to production" cycles. As most of the CLARIAH work is "in development" repostatus is a convenient scheme. However, having a TRL level is from a user perspective important. Would it be feasible to have TRL + repostatus? E.g., "late prototype -- unsupported". OTAP is more about internal organisation, not about communication to end-user, however, what I like is the "acceptance" bit: it would be ideal if we could add this one somewhere, e.g., "late prototype -- accepted -- unsupported". However, as we are still far from having a project workflow that is able to also test the acceptance criteria with end user (as would be the ideal scenario in an agile, co-development approach), this is a stretch. What we could do however is to allocate some space in the schema to add information on evaluation, or a (research) paper that uses the software, so that we can provide some sort of validation information and link development with usage.
I agree that these are indeed different perspectives on assessing the status.
Would it be feasible to have TRL + repostatus?
We can automatically map repostatus to TRL for presentational purposes towards the user, although we lose some granularity there. But how crucial is the high granularity TRL offers? I'm rather reluctant to ask developers to provide two metrics that overlap largely, as one of the basic principles we're following is to avoid any repetition/redunancy. If we do add a metric, I'd prefer to add one that doesn't correlate with the existing one (repostatus). So rather than add TRL as such we could add a field for the 'validation/evaluation/acceptance' dimension (e.g. "proven in in the lab/in space"), which is in line with what you suggested I think?
I wrote down a formal definition of the technology readiness levels in CLARIAH/tool-discovery#16 , please comment there on the contents.
As to the relation between this and the repo status vocabulary; @roelandordelman argued in this issue that these TRL levels were more from the perspective of the user, whereas the repostatus vocabulary are more from a developer and software maintenance perspective. Though there is a partial overlap, both do address some different dimensions and we feel it's worth to specifically ask developers to provide both metadata properties.
Considering this closed after discussions in the Technical Committee a few weeks ago, I didn't get further feedback on https://github.com/CLARIAH/tool-discovery/pull/16 though.
I want to open a discussion regarding the terms we use for indicating the development status of software, and do a proposal.
The tool discovery track aims to describe metadata at the source (i.e. alongside the sourcecode). Developers themselves provide the metadata and this will then be harvested and published. We align with the existing codemeta initiative, who in turn collaborate with schema.org. For indicating the development status of software source code repository, codemeta uses/recommends repostatus. As we build upon all those existing initiatives, we inherit this choice, and I think that is a good choice.
The repostatus vocabulary defines the following terms to describe software projects (aimed at the source code repository)::
They nicely explain the relation between these on this page.
Because this is an existing initiative with some established popularity in the developer community, and codemeta also adopts this, this is the vocabulary I want to actively prescribe for CLARIAH as well. That is, I propose all our tool providers will be asked to provide their development status (i.e. software project status) using these terms.
A fair amount of projects (over 3000 on Github) are already using this vocabulary and have 'badges' in their READMEs to express their development status. Our tool harvesting pipeline can already interpret these as there is existing software for that. The main point I want to convey here is that I propose to align with existing initiatives and not reinvent the wheel.
However, within CLARIAH we have also spoken about "Technology Readiness Levels", and even come to a definition of the levels (and 'stages' to group them). They are in turn derived from some EU vocabulary and are also a powerful concept:
There is a fair degree of overlap between these vocabularies so parts can be easily mapped, but they also differ in some regards:
For describing software projects, at least from the perspective of the tool developers who actually provide the metadata, I propose we adhere to the repostatus vocabulary:
Please share your views! I'm giving @roelandordelman an extra poke as we may want this to be officially 'stamped' by the CTO or a board decision. Also poking @Seb-CLARIAH as these decisions affect what data Ineo will get. I don't object to having the vocabulary mapped to something like TLR for representational purposes if that makes more sense for a particular (scholarly) audience.