CLARIAH / clariah-plus

This is the project planning repository for the CLARIAH-PLUS project. It groups all technical documents and discussions pertaining to CLARIAH-PLUS in a central place and should facilitate findability, transparency and project planning, for the project as a whole.
9 stars 6 forks source link

Development status vocabulary #98

Closed proycon closed 1 year ago

proycon commented 2 years ago

I want to open a discussion regarding the terms we use for indicating the development status of software, and do a proposal.

The tool discovery track aims to describe metadata at the source (i.e. alongside the sourcecode). Developers themselves provide the metadata and this will then be harvested and published. We align with the existing codemeta initiative, who in turn collaborate with schema.org. For indicating the development status of software source code repository, codemeta uses/recommends repostatus. As we build upon all those existing initiatives, we inherit this choice, and I think that is a good choice.

The repostatus vocabulary defines the following terms to describe software projects (aimed at the source code repository)::

  • Concept – Minimal or no implementation has been done yet, or the repository is only intended to be a limited example, demo, or proof-of-concept.
  • WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.
  • Suspended – Initial development has started, but there has not yet been a stable, usable release; work has been stopped for the time being but the author(s) intend on resuming work.
  • Abandoned – Initial development has started, but there has not yet been a stable, usable release; the project has been abandoned and the author(s) do not intend on continuing development.
  • Active – The project has reached a stable, usable state and is being actively developed.
  • Inactive – The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.
  • Unsupported – The project has reached a stable, usable state but the author(s) have ceased all work on it. A new maintainer may be desired.
  • Moved - The project has been moved to a new location, and the version at that location should be considered authoritative. This status should be accompanied by a new URL.

They nicely explain the relation between these on this page.

Because this is an existing initiative with some established popularity in the developer community, and codemeta also adopts this, this is the vocabulary I want to actively prescribe for CLARIAH as well. That is, I propose all our tool providers will be asked to provide their development status (i.e. software project status) using these terms.

A fair amount of projects (over 3000 on Github) are already using this vocabulary and have 'badges' in their READMEs to express their development status. Our tool harvesting pipeline can already interpret these as there is existing software for that. The main point I want to convey here is that I propose to align with existing initiatives and not reinvent the wheel.

However, within CLARIAH we have also spoken about "Technology Readiness Levels", and even come to a definition of the levels (and 'stages' to group them). They are in turn derived from some EU vocabulary and are also a powerful concept:

  • Planning stage (pre-alpha):
    • 0 - Idea - Unproven, untested and largely unformulated concept
    • 1 - Initial Research - Basic (scholarly) needs observed and reported
    • 2 - Concept Formulated - Initial technology/application has been concept formulated
  • Proof of Concept stage (alpha):
    • 3 - Proof of Concept - Initial Proof-of-concept of key functionality . Concept presented for initial feedback from scholarly users. Not yet validated and not suitable for end-users yet.
    • 4 - Validated Proof of Concept - Validated Proof-of-concept of key functionality. Technology validated in its own experimental setting (e.g. the lab). Not mature enough for end-users yet.
  • Experimental stage (beta):
    • 5 - Early Prototype - Technology validated in target setting (e.g. with potential end-users)
    • 6 - Late Prototype - Technology demonstrated in target setting, end-users adopt it for testing purposes.
    • 7 - Release Candidate - Technology ready enough and in initial use by end-users in intended scholarly environments. Further validation in progress.
  • Production stage (stable):
    • 8 - Complete - Technology complete and qualified, released for all end-users in scholarly environments.
    • 9 - Proven - Technology complete and proven in practice by real users.

There is a fair degree of overlap between these vocabularies so parts can be easily mapped, but they also differ in some regards:

For describing software projects, at least from the perspective of the tool developers who actually provide the metadata, I propose we adhere to the repostatus vocabulary:

Please share your views! I'm giving @roelandordelman an extra poke as we may want this to be officially 'stamped' by the CTO or a board decision. Also poking @Seb-CLARIAH as these decisions affect what data Ineo will get. I don't object to having the vocabulary mapped to something like TLR for representational purposes if that makes more sense for a particular (scholarly) audience.

proycon commented 2 years ago

(By the way, this relates to the wider vocabulary discussion in #32, but I wanted to tackle this separately)

proycon commented 2 years ago

Another thing that came to mind and which I want to add for completion's sake is that when speaking of the development and deployment process, which especially makes sense for software services, it is customary to have the following phases:

Quoting wikipedia:

  1. The program or component is developed on a Development system. This development environment might have no testing capabilities.
  2. Once the software developer thinks it is ready, the product is copied to a Test environment, to verify it works as expected. This test environment is supposedly standardized and in close alignment with the target environment.
  3. If the test is successful, the product is copied to an Acceptance test environment. During the Acceptance test, the customer will test the product in this environment to verify whether it meets their expectations.
  4. If the customer accepts the product, it is deployed to a Production environment, making it available to all users of the system.

This is typically a cycle that repeats at every release and is surely good practice for us all to adhere to, but I don't see a reason for expressing this explicitly in the software metadata and see more value in the aforementioned repostatus vocabulary. Expressing that something is in a testing/acceptance state can be done by tagging releases as "release candidates" (a suffix to the version number).

roelandordelman commented 2 years ago

In my view these are three different perspectives on assessing status of software. (i) Repostatus is more for developers, (ii) TRL for users, (iii) OTAP voor "moving to production" cycles. As most of the CLARIAH work is "in development" repostatus is a convenient scheme. However, having a TRL level is from a user perspective important. Would it be feasible to have TRL + repostatus? E.g., "late prototype -- unsupported". OTAP is more about internal organisation, not about communication to end-user, however, what I like is the "acceptance" bit: it would be ideal if we could add this one somewhere, e.g., "late prototype -- accepted -- unsupported". However, as we are still far from having a project workflow that is able to also test the acceptance criteria with end user (as would be the ideal scenario in an agile, co-development approach), this is a stretch. What we could do however is to allocate some space in the schema to add information on evaluation, or a (research) paper that uses the software, so that we can provide some sort of validation information and link development with usage.

proycon commented 2 years ago

I agree that these are indeed different perspectives on assessing the status.

Would it be feasible to have TRL + repostatus?

We can automatically map repostatus to TRL for presentational purposes towards the user, although we lose some granularity there. But how crucial is the high granularity TRL offers? I'm rather reluctant to ask developers to provide two metrics that overlap largely, as one of the basic principles we're following is to avoid any repetition/redunancy. If we do add a metric, I'd prefer to add one that doesn't correlate with the existing one (repostatus). So rather than add TRL as such we could add a field for the 'validation/evaluation/acceptance' dimension (e.g. "proven in in the lab/in space"), which is in line with what you suggested I think?

proycon commented 2 years ago

I wrote down a formal definition of the technology readiness levels in CLARIAH/tool-discovery#16 , please comment there on the contents.

As to the relation between this and the repo status vocabulary; @roelandordelman argued in this issue that these TRL levels were more from the perspective of the user, whereas the repostatus vocabulary are more from a developer and software maintenance perspective. Though there is a partial overlap, both do address some different dimensions and we feel it's worth to specifically ask developers to provide both metadata properties.

proycon commented 1 year ago

Considering this closed after discussions in the Technical Committee a few weeks ago, I didn't get further feedback on https://github.com/CLARIAH/tool-discovery/pull/16 though.