OAI / tools.openapis.org

A collection of open-source and commercial tools for creating your APIs with OpenAPI - Sourced from and published for the community
https://tools.openapis.org/
96 stars 26 forks source link

Allow topics to override primary category #103

Open mnahkies opened 9 months ago

mnahkies commented 9 months ago

User Story

As a tool developer, I'd like to be able to override the category classification given to my tool. Specifically I'd like https://github.com/mnahkies/openapi-code-generator to be labelled as a "Code Generator" rather than a "Parser"

Context

Currently the category is assigned using https://www.npmjs.com/package/bayes which essentially uses the frequency of tokens in a provided text against the frequency of tokens in already classified text to assign a class.

However, because the current category/class distributions are pretty uneven (>30% are assigned to "Parsers") it seems to have ended up overly biasing assignment to "Parsers". For example, Redoc is assigned "User Interfaces" and "Parsers", but not "Documentation"

And these are all assigned to "Parsers" as well:

Rather than "Code Generator" / "Mock" / "Documentation" / "Testing Tools"

I'm not sure if this is inherent to the classification approach / problem space (eg: is the written language used for different types of tool lacking enough distinguishing tokens to give a good signal), or a negative feedback loop from the existing classifications, but either way I think it would be good to have a way to override this behavior.

I'm hopeful that introducing this would over time improve the accuracy of the classification using bayes as a result of the accurate manually labelled data.

Detailed Requirement

Propose adding a way to manually label a primary category for a tool. I see two main options:

I see the primary benefit of the first option being that it gives control of curation to the maintainers of this repository, whilst the second option allows tool writers to self serve. It's possible that both might be desirable, especially to account for entries that aren't scrapped from Github (though I guess their categories are essentially manually configured already).

I think some amount of rationalization (eg: Testing vs Testing Tools) of the existing categories may be useful as well, and potentially adding a description of each category explaining what is in/out of scope for it.

mnahkies commented 1 month ago

@SensibleWood do you have any thoughts on this? I'm open to attempting an implementation, but would appreciate some feedback on whether it would be likely to be accepted before investing the effort.

SensibleWood commented 1 week ago

@mnahkies thanks for raising this issue and sorry for the delay in replying. Work on this website has taken a hiatus as there has been other priorities.

I am very open to agreeing an approach and an implementation. There is a need to uplift the repository for Arazzo (which already lives under #157) so now is a good time to rethink categorisation. The original categories and approach was spawned from other initiatives and sources and, whilst it got this site going, needs refinement.

I would suggest we agree a time to talk with voices and take it from there. Thanks again for raising this.