Define acceptance process for listing plugins on if registry

Green-Software-Foundation / if

Impact Framework

https://if.greensoftware.foundation/

MIT License

143 stars 40 forks source link

Define acceptance process for listing plugins on if registry #617

Closed jmcook1186 closed 4 months ago

jmcook1186 commented 5 months ago

Sub of: #633

User story

As a plugin builder, I want my plugin to be discoverable by other IF users in some central location, while keeping control over the plugin in my own registry.

Rationale

The IF has an ethos of decentralising the plugin ecosystem, but we also want people to be able to browse plugins that are not in our direct control. For this, we can build a central registry where we list and link out to community plugins. This ticket is only to create a github repository to store the source code and set the repo permissions and config. The IF team need to find the right balance of responsibility for the plugins listed and the permissionlessness of the ecosystem (i.e. we don't want to list junk or unsafe plugins, but we also don't want to gatekeep). This ticket is for defining the right processes for deciding what to list and how to organize the registry.

Implementation details

[x] research similar challenges in other projects
[x] create proposal or set of proposals to discuss
[x] refine ideas through async comms and spike calls
[x] agree process
[x] implement once registry is launched
[x] document process publicly (probably in markdown doc on registry repo)

Priority

5/5

Size

What does "done" look like?

Process is implemented and documented

Deadline

tbc

pazbardanl commented 5 months ago

hey @jmcook1186 @jawache @narekhovhannisyan @MariamKhalatova @manushak I thought of some approaches we can take to tackle this. It's written in details here: https://hackmd.io/@pazbarda/BkfkezEgA (please let me know if you have issues accessing the doc)

In summary: i can think of 3 approaches, not mutually exclusive, to balance the need for some oversight with our limited resources:

Random Code Reviews / QA Testing by Core Team: Premise: While we lack resources to review every plugin, we aim to maintain standards by conducting periodic random audits. Strategy: Regularly select a plugin from the registry for review, focusing on critical issues and providing actionable feedback. Pros: Demonstrates active auditing, ensures community trust, and contributes to maintaining quality and security. Cons: Relies on random selection, may not cover all plugins, and requires consistent (although limited) effort from the core team.

Registry Interview / Questionnaire: Premise: Utilize a conversation or questionnaire with contributors to assess plugin quality and security. Strategy: Engage contributors in a dialogue, asking targeted questions to identify potential issues. Pros: Efficient and collaborative, encourages communication, and provides an opportunity for self-assessment. Cons: Relies on contributor honesty, may miss complex issues, and requires skilled facilitation.

For the Community, by the Community: Premise: Leverage the expertise of the community to assist in maintaining quality and security. Strategy: Encourage community members to contribute by reviewing plugins, reporting issues, and engaging with maintainers. Pros: Harnesses community resources, fosters collaboration, and distributes responsibility. Cons: Relies on voluntary contributions, may lack consistency, and requires active management.

jmcook1186 commented 5 months ago

Thanks for looking into this @pazbardanl

Personally like the idea of an entry questionnaire for the registry that covers some basic checklist - ideally we can verify the responses very rapidly (maybe stackblitz can help with this) or we trust that the responses are true and let further investigations happen in a decentralized way post-acceptance. Maybe some mechanism for people to upvote/downvote plugins might be nice.

One thing that jumps out is the need for a delisting process - probably just a template for people to raise a complaint about a specific plugin so that we can consider removing it from the site if the questionnaire responses turn out to be dishonest or the plugin breaks.

pazbardanl commented 5 months ago

hi @jmcook1186 . First I just want to be clear the suggestions so far are my ideas, not so much a result of research of common practices for similar cases in other open source projects. I'm using some contacts i have to get anecdotal input on what's mostly done to approach this issue in other projects.

Personally I think the questionnaire approach in combination with the random auditing / reviewing of registered projects is a good combination: they are not mutually exclusive but they ARE complementary in a way: The initial interview is used to filter out projects (or more forgivingly - get them to make improvements before we register them) and the random review could be a sort of an audit that validates initial responses are true. And if they're not - this is where the delisting comes in play. Unfortunate, but we might need it.

All in all I think the realistic approach here is to treat the registry as a "quality stamp", which is not bullet proof (we just don't have the resources) but DOES represent a plugin meeting our standards.

pazbardanl commented 5 months ago

@jmcook1186 @jawache I've spoken to 2 connections I have that are pretty experienced in the open source domain. Presented with our registry idea and challenges, they both said similar things: It's a classic open-source problem. It's the contention between having (or - aspiring to have) a vast and open community where everybody can contribute, and needing some level of standards of quality everyone must adhere to. Bottom line is that I got no practical advice on this, and i left both conversations feeling that this is a balance we'll just need to maintain, and accept that there is an inherent risk of missing out from time to time.

Having said that, I i realize the word risk here is crucial: It's s risk management problem. We vouch for a plugin, while not being able to validate it 100% for having met our standards. Therefore we run the risk of vouching for a bad plugin.

As such, this risk needs to be:

Evaluated - what is the probability of something bad happening? what's the impact (worst case)?
Mitigated - minimize the probability of something bad happening. We do that by answering "what CAN we do? what/ how much CAN we validate?" -- I think the outcome of this ticket directly answers this one with a plan.
Contained - once something bad happens: minimize the impact, learn from what happened, implement the learnings, move on.

jawache commented 5 months ago

Thanks @pazbardanl

This is great, I laughed when you wrote "Bottom line is that I got no practical advice on this" :) so sounds like we just need to decide as you say what the risk / reward is.

I don't think we can audit, we're not qualified to judge the universe of environmental impact models :/ we can perhaps do some basic auditing of security features, have a linter, existence of tests etc... but should be something automated mostly. I like the idea of an interview which asks them to give some evidence with a basic checklist of items they have to make sure is clicked before we list. The rest can be done via the community, star ratings, report this plugin link.

Another approach is to mostly have the bar low but then tier the plugins, so you get a gold badge if you meet much stronger criteria (evidence of test cases, rich support, 10 reviews, lots of docs, citations, things like that). It's kinda like our process for deciding if a project in the GSF is incubation or graduated, you need to show evidence of a higher bar to be graduated and there is a process to get graduated, but it doesn't stop projects from launching and experimenting. I kinda like the idea of this being a melting pot of everything (being very inclusive) but surfacing up the good ones to the top (having high standards), it matches how we function internally as well.

I think there is a very far future where certain plugins will get approved by 3rd party auditors, which signals that those plugins are ok to be used for calculating say regulatory reporting numbers. We can eventually badge them differently.

Evaluated

In terms of the worst that can happen, a plugin can make claims that are incorrect (intentionally or accidentally) and people can make decisions based on those claims, e.g. incorrectly measure carbon emissions.

Mitigated

One thing we have is that we are not qualified or resourced to judge the claims of a plugin, if one measures an environmental impact we should not be the ones to say their approach is correct or incorrect. We are here to surface the options so I think we can insist they back up any claims with references/citations, show evidence (interview?) like what we asked for in the hackathon. So we can't judge if the claim is right, but we can have a minimum bar of transparency and disclosure so the end users can decide themselves.
Big fat disclaimer.

Contained

A process to delist a plugin that has enough evidence that it is incorrect (should be crowdsourced, voting, review by a committee, not one of the IF team deciding)
Perhaps IF CLI gives a warning to the end user that a plugin they are using is not one of the ones in the official registry.

pazbardanl commented 5 months ago

hey @jawache .

Understood. We can't judge what's a good environmental impact model (who can? let's get them on our side), but we can still:

(quoting you on the parts that best capture it, to my understanding: ) "insist they back up any claims with references/citations, show evidence... have a minimum bar of transparency and disclosure so the end users can decide themselves". This is key: here we are acting as non-expert gate keepers, making sure plugin developers can back up their claims with reasonable facts / evidence, which are more for the community to review/judge and less for us.
Examine the plugin as a piece of software: as developers ourselves, we CAN apply different levels of technical examinations to make sure the plugin is of sufficient software quality. i.e random (or not so random) code reviews and manual testing. Once in awhile just pick a plugin and spend an hour looking at the code and playing around with it. For this part we can also include sanity-check questions in the interview: what tests did you implement? have you tested this or that corner case? what kind of manual validations have you done? what's your code review process? and so on..

The melting pot model is a good one. It creates a healthy, meritocratic workflow. I think special attention is needed here to make sure the merits we give (gold badge etc) are indeed perceived as something worth working towards. In other words: "as a plugin developer - what's in it for me?" does having their plugin registered make a desirable enough goal on its own? You mentioned GSF is already working in this manner with other projects. is it documented somewhere? how can I find out more?

Crowdsourcing part of the feedback loop will be most efficient, no doubt. I understand there will be a website for the registry, but the backend will be Github, right? if so then we're set on the technical aspect, just need a process around it: who tracks the stars and reviews, what do we do when detecting a good (or bad) outlier etc. I consider coming up with this process in scope of this issue (i.e ill come up with one).

Decision making on delisting - if non IF, then who?, in my mind i see a GSF appointed committee, but I honestly have 0 clue if this is even realistic or not.

I'm ok with big fat disclaimers and CLI warnings. Who wouldn't be? :)

pazbardanl commented 5 months ago

@jawache @jmcook1186 I've added a section to the HackMD doc: https://hackmd.io/@pazbarda/BkfkezEgA and invited you both with write permissions. The section describes a draft for a "badges model" with criteria for each badge, based on recent comments above. I've also detailed how the "bronze" and "silver" badge are going to be handed out. My plan is to have some kind of a model for merits and an agreed upon process of getting these merits (badges is one model, we can go any other you have in mind). Once we have an agreement on merits and how they are granted, I can star working with whoever builds the registry website on how it's going to support this model. Does that makes sense?

pazbardanl commented 4 months ago

@jmcook1186 @jawache

Registry Form outline:

Please check the relevant boxes below

[ ] My plugin executes a demo manifest yml with no errors or crashes.
[ ] My plugin has README file
[ ] My plugin's README file contains a sample manifest code.
[ ] My plugin's code contain unit tests that cover 100% of the plugins code.
[ ] All unit tests that test my plugin are passing.
[ ] All other unit tests in the if-models repo are passing after my code is integrated into it.
[ ] My plugin's code have no security vulnerabilities.

Define, in your own words, what is(are) the Requirement(s) of your plugin. I.E what is the basic functionality it must demonstrate.

Please provide citations, links and references that support the validity of the implementation of your plugin. I.E provide reference to sources explaining how this implementation meets the requirement described above.

Please describe the test cases covered by your plugin's dedicated unit tests

Please provide a demo manifest

jmcook1186 commented 4 months ago

Thanks @pazbardanl this seems like a good start. We can probably drop the question about whether the README contains sample manifest code and instead insist that a working manifest is submitted as a standalone yaml file. Maybe insisting on 100% test coverage is a bit strict?

I think we can just implement this as an issue-form on the registry Github repository. Then it can be a prerequisite for a PR, and link to the submitted form from the plugin's card on the registry.

I think the right next step is to raise a PR to that repository to add the issue-form and we can tweak the content together directly on the PR.