bio-tools / biotoolsRegistry

biotoolsregistry : discovery portal for bioinformatics
GNU General Public License v3.0
70 stars 20 forks source link

Bio.Tools metrics: quality, quantity, progress, and contribution indicators #113

Open matuskalas opened 8 years ago

matuskalas commented 8 years ago

As a preamble to this topic, a great example: https://bio.tools/tool/Galaxy/version/none :-1:

Here comes a list of metrics/indicators that are EXTREMELY EASY TO IMPLEMENT, while at the same time excellent indicators of quality, quantity, contribution, and progress.

Notes: The SIMPLEST and most relevant indicators are in bold. The rest are additional that are similarly SIMPLE and relevant, but less general (more specific). All the following has been mentioned and discussed regularly since the EMBRACE Registry times, repeatedly in various meetings in Amsterdam and Lyngby, including Kristoffer, @ekry, @joncison, @hmenager, Łukasz, me, Manchester folks, and Gert.

A. Basic (=) quantity metrics

1. # of attributes (nodes or leaves in the JSON/XML tree; summed over all entries) 2. # of operations (with at least one EDAM data concept ≠ 0006, as input or output, and at least one EDAM operation concept ≠ 0004; summed over all entries) 3. # of entries

These 3 should certainly be shown also on the top of the Bio.Tools "home page", and then also on the top of each list/table of search results (then of course per the found entries).

B. Community (=) contribution metrics

4. # of updates of an entry (summed over all entries) 5. # of individual registrants/curators (especially nice after the anonymous registrant groups a.k.a. "affiliations" are split into real users)

  1. of registrant/curator institutions

  2. of authors/developers/contributors in Credits

  3. of institutions in Credits

  4. of publications (with distinct DOIs)

  5. of public repositories (GitHub etc.), and similar useful & non-mandatory attributes

    C. Quality metrics (for the whole registry)

**11. 1. ÷ 3. -- i.e. # of attributes per # of entries

    1. ÷ 3. -- i.e. # of operations per # of entries
    1. ÷ 3. -- i.e. # of all entry updates per # of entries
    1. ÷ 3. -- i.e. # of registrants/curators per # of entries**
    1. ÷ 3. -- i.e. # of authors/developers/contributors per # of entries (possibly etc. with 6., 8. - 10.)

      D. Progress visualisation

      • The growth of ALL THE METRICS ABOVE over time (especially 1. - 5. and 11. - 14.; with per-day resolution)
      • Note: A separate report should be published where the above growth curves are plotted on the time-line together with hackathons' and workshops' dates marked.

        Note:

All the indicators A. - D. can also be internally (within ELIXIR-EXCELERATE WP1) reported PER PARTNER plus per "the rest of the contributors (i.e. non-EL-EX-WP1)". The only required dependency is to first manually split all registrants into the "outreach and support spheres" per EL-EX-WP1 partner. Some registrants can fall under multiple partners, e.g. all de.NBI ones are supported by DK+NO+FR.

E. Quality metrics (for one entry)

matuskalas commented 8 years ago

One more note:

"2. # of operations (with at least one EDAM data concept ≠ 0006, as input or output, and at least one EDAM operation concept ≠ 0004; summed over all entries)"

means: # of DISTINCT operations within a Bio.Tools entry, where each can have multiple functions i.e. EDAM operation concepts ≠ 0004.

That leads to another SIMPLE and relevant metric 2.5:

2.5. # of functions (i.e. # of EDAM operation concepts ≠ 0004, in operations with at least one EDAM data concept ≠ 0006, as input or output. That means that useless operations without neither inputs nor outputs are ignored, just like in 2.)

Noteworthy, both 2. and 2.5 are relevant and SIMPLE, each important and motivating for good annotations separately: 2. for well-annotated tools with multiple operations (e.g. toolkits), and 2.5 for well-annotated tools with integrated functionality (e.g. workflows).

A corresponding quality metric (C.) should be added: 12.5. 2.5. ÷ 3. -- i.e. # of functions per # of entries, as well as a corresponding progress metric (D.), and a corresponding per-entry quality metric (E.).

joncison commented 8 years ago

Very useful - thanks a million for this proposal Matus. Enhanced content reporting is in the roadmap (http://biotools.readthedocs.io/en/latest/changelog_roadmap.html) for Dec 16 and could include much of this.

ps. that more-or-less empty entry you pointed out was intentional: the BioExcel partners will be adding details in due course. We just needed to add them to bio.tools to allow a means for them to make edits. Really they should be in the "staging area" / marked as "beta" and this in the roadmap for 2017 Q1.

joncison commented 7 years ago

@matuskalas - we should def. pick up on this later in the year once other higher priority things are out the way. I label as "complex" because while each individual thing is easy enough to do, there are lots of them

joncison commented 7 years ago

From https://github.com/bio-tools/biotoolsregistry/issues/25:

Also stats for each annotation: • Publications (PMID, PMCID and DOI total) • Contacts (i.e. number of emails and/or URLs) • Documentation links • Download links • License • Operating system • Language • Maturity

that's the key ones right now (given current content) I think?

joncison commented 7 years ago

Countries / top-level domains A plot of top-level domains would be nice, something like this, or even better having these date mapped to a world-map would be super-cool capture

Institutes

joncison commented 7 years ago

@matuskalas - on Monday me and @ekry will finalise which of the above ideas will make it into the next revision of bio.tools/stats : do you have any more ideas to add? Thanks!

joncison commented 7 years ago

I'd like to hear some suggestions about what metrics we could get in light of the groupings in the information standard (https://github.com/bio-tools/biotoolsSchemaDocs/blob/master/information_requirement.rst), see https://github.com/bio-tools/biotoolsSchema/issues/77

Specifically aggregated metrics to capture things like project maturity & community as evidenced by things like repos, documentation, mailing list etc etc..

joncison commented 6 years ago

Additional such metrics are an issues for biotoolsLint (https://github.com/bio-tools/biotoolsLint) to calculate potentially.

scapella commented 6 years ago

Please, keep us in the loop (OpenEBench) as we have implemented, are implementing and/or looking for mechanisms for computing those metrics by consulting several entries.

Cheers,

Salva

On Tue, Oct 25, 2016 at 1:18 PM Matúš Kalaš notifications@github.com wrote:

As a preamble to this topic, a great example: https://bio.tools/tool/Galaxy/version/none 👎

Here comes a list of metrics/indicators that are EXTREMELY EASY TO IMPLEMENT, while at the same time excellent indicators of quality, quantity, contribution, and progress.

Notes: The SIMPLEST and most relevant indicators are in bold. The rest are additional that are similarly SIMPLE and relevant, but less general (more specific). All the following has been mentioned and discussed regularly since the EMBRACE Registry times, repeatedly in various meetings in Amsterdam and Lyngby, including Kristoffer, @ekry https://github.com/ekry, @joncison https://github.com/joncison, @hmenager https://github.com/hmenager, Łukasz, me, Manchester folks, and Gert. A. Basic (=) quantity metrics

1. # of attributes (nodes or leaves in the JSON/XML tree; summed over all entries) 2. # of operations (with at least one EDAM data concept ≠ 0006, as input or output, and at least one EDAM operation concept ≠ 0004; summed over all entries) 3. # of entries

These 3 should certainly be shown also on the top of the Bio.Tools "home page", and then also on the top of each list/table of search results (then of course per the found entries). B. Community (=) contribution metrics

4. # of updates of an entry (summed over all entries) 5. # of individual registrants/curators (especially nice after the anonymous registrant groups a.k.a. "affiliations" are split into real users)

  1. of registrant/curator institutions

  2. of authors/developers/contributors in Credits

  3. of institutions in Credits

  4. of publications (with distinct DOIs)

    C. Quality metrics (for the whole registry)

*10. 1. ÷ 3. -- i.e. # of attributes per # of entries 11. 2. ÷ 3. -- i.e.

of operations per # of entries 12. 4. ÷ 3. -- i.e. # of all entry updates

per # of entries*

    1. ÷ 3. -- i.e. # of registrants/curators per # of entries
    1. ÷ 3. -- i.e. # of authors/developers/contributors per # of entries (possibly etc. with 6., 8., 9.) D. Progress visualisation
    • The growth of ALL THE METRICS ABOVE over time (especially 1. - 5. and 10. - 12.; with per-day resolution)
    • Note: A separate report should be published where the above growth curves are plotted on the time-line together with hackathons' and workshops' dates marked.

Note:

All the indicators A. - D. can also be internally (within ELIXIR-EXCELERATE WP1) reported PER PARTNER plus per "the rest of the contributors (i.e. non-EL-EX-WP1)". The only required dependency is to first manually split all registrants into the "outreach and support spheres" per EL-EX-WP1 partner. Some registrants can fall under multiple partners, e.g. all de.NBI ones are supported by DK+NO+FR. E. Quality metrics (for one entry)

  • The same as 1. - 2. and 4. - 9., BUT FOR THE GIVEN ENTRY
  • The same as above, BUT PER CURRENT AVERAGES (i.e. per 10. - 14.)
  • Note: Both of these can be beautifully visualised with some tiny icons in the entry cards/rows, and even in the future taken into account when sorting search results.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bio-tools/biotoolsregistry/issues/113, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH4h0gjRTEfgIThmdnR3_MRP7I5vAiIks5q3eWWgaJpZM4Kf2tK .

joncison commented 6 years ago

will do @scapella ... any metrics we calculate internally will be strictly in scope of what data we have in bo.tools. Stuff above is obviously only a small slice of all the different metrics we (== ELIXIR) has been considering.