matuskalas commented 8 years ago

As a preamble to this topic, a great example: https://bio.tools/tool/Galaxy/version/none :-1:

Here comes a list of metrics/indicators that are EXTREMELY EASY TO IMPLEMENT, while at the same time excellent indicators of quality, quantity, contribution, and progress.

Notes: The SIMPLEST and most relevant indicators are in bold. The rest are additional that are similarly SIMPLE and relevant, but less general (more specific). All the following has been mentioned and discussed regularly since the EMBRACE Registry times, repeatedly in various meetings in Amsterdam and Lyngby, including Kristoffer, @ekry, @joncison, @hmenager, Łukasz, me, Manchester folks, and Gert.

A. Basic (=) quantity metrics

1. # of attributes (nodes or leaves in the JSON/XML tree; summed over all entries) 2. # of operations (with at least one EDAM data concept ≠ 0006, as input or output, and at least one EDAM operation concept ≠ 0004; summed over all entries) 3. # of entries

These 3 should certainly be shown also on the top of the Bio.Tools "home page", and then also on the top of each list/table of search results (then of course per the found entries).

B. Community (=) contribution metrics

4. # of updates of an entry (summed over all entries) 5. # of individual registrants/curators (especially nice after the anonymous registrant groups a.k.a. "affiliations" are split into real users)

of registrant/curator institutions
of authors/developers/contributors in Credits
of institutions in Credits
of publications (with distinct DOIs)
of public repositories (GitHub etc.), and similar useful & non-mandatory attributes

C. Quality metrics (for the whole registry)

**11. 1. ÷ 3. -- i.e. # of attributes per # of entries

1. ÷ 3. -- i.e. # of operations per # of entries
1. ÷ 3. -- i.e. # of all entry updates per # of entries
1. ÷ 3. -- i.e. # of registrants/curators per # of entries**
1. ÷ 3. -- i.e. # of authors/developers/contributors per # of entries (possibly etc. with 6., 8. - 10.)
  D. Progress visualisation
  - The growth of ALL THE METRICS ABOVE over time (especially 1. - 5. and 11. - 14.; with per-day resolution)
  - Note: A separate report should be published where the above growth curves are plotted on the time-line together with hackathons' and workshops' dates marked.
    Note:

All the indicators A. - D. can also be internally (within ELIXIR-EXCELERATE WP1) reported PER PARTNER plus per "the rest of the contributors (i.e. non-EL-EX-WP1)". The only required dependency is to first manually split all registrants into the "outreach and support spheres" per EL-EX-WP1 partner. Some registrants can fall under multiple partners, e.g. all de.NBI ones are supported by DK+NO+FR.

E. Quality metrics (for one entry)

The same as 1. - 2. and 4. - 10., BUT FOR THE GIVEN ENTRY
The same as above, BUT PER CURRENT AVERAGES (i.e. per 11. - 15.)
Note: Both of these can be beautifully visualised with some pretty tiny icons in the entry cards/rows, and even in the future taken into account when sorting search results.

matuskalas commented 8 years ago

One more note:

"2. # of operations (with at least one EDAM data concept ≠ 0006, as input or output, and at least one EDAM operation concept ≠ 0004; summed over all entries)"

means: # of DISTINCT operations within a Bio.Tools entry, where each can have multiple functions i.e. EDAM operation concepts ≠ 0004.

That leads to another SIMPLE and relevant metric 2.5:

2.5. # of functions (i.e. # of EDAM operation concepts ≠ 0004, in operations with at least one EDAM data concept ≠ 0006, as input or output. That means that useless operations without neither inputs nor outputs are ignored, just like in 2.)

Noteworthy, both 2. and 2.5 are relevant and SIMPLE, each important and motivating for good annotations separately: 2. for well-annotated tools with multiple operations (e.g. toolkits), and 2.5 for well-annotated tools with integrated functionality (e.g. workflows).

A corresponding quality metric (C.) should be added: 12.5. 2.5. ÷ 3. -- i.e. # of functions per # of entries, as well as a corresponding progress metric (D.), and a corresponding per-entry quality metric (E.).

joncison commented 8 years ago

Very useful - thanks a million for this proposal Matus. Enhanced content reporting is in the roadmap (http://biotools.readthedocs.io/en/latest/changelog_roadmap.html) for Dec 16 and could include much of this.

ps. that more-or-less empty entry you pointed out was intentional: the BioExcel partners will be adding details in due course. We just needed to add them to bio.tools to allow a means for them to make edits. Really they should be in the "staging area" / marked as "beta" and this in the roadmap for 2017 Q1.