Allow topics to override primary category

mnahkies commented 9 months ago

User Story

As a tool developer, I'd like to be able to override the category classification given to my tool. Specifically I'd like https://github.com/mnahkies/openapi-code-generator to be labelled as a "Code Generator" rather than a "Parser"

Context

Currently the category is assigned using https://www.npmjs.com/package/bayes which essentially uses the frequency of tokens in a provided text against the frequency of tokens in already classified text to assign a class.

However, because the current category/class distributions are pretty uneven (>30% are assigned to "Parsers") it seems to have ended up overly biasing assignment to "Parsers". For example, Redoc is assigned "User Interfaces" and "Parsers", but not "Documentation"

And these are all assigned to "Parsers" as well:

OpenAPI Server Code Generator (oapi-codegen)
OpenAPI Mocker
docs
php-openapi-faker
...

Rather than "Code Generator" / "Mock" / "Documentation" / "Testing Tools"

I'm not sure if this is inherent to the classification approach / problem space (eg: is the written language used for different types of tool lacking enough distinguishing tokens to give a good signal), or a negative feedback loop from the existing classifications, but either way I think it would be good to have a way to override this behavior.

I'm hopeful that introducing this would over time improve the accuracy of the classification using bayes as a result of the accurate manually labelled data.

Detailed Requirement

Propose adding a way to manually label a primary category for a tool. I see two main options:

Field on the tools.yaml entries like manualCategoryOverride
Looking for new topics on the source entries like the existing openapi3 / openapi31 ones that indicate the primary category

I see the primary benefit of the first option being that it gives control of curation to the maintainers of this repository, whilst the second option allows tool writers to self serve. It's possible that both might be desirable, especially to account for entries that aren't scrapped from Github (though I guess their categories are essentially manually configured already).

I think some amount of rationalization (eg: Testing vs Testing Tools) of the existing categories may be useful as well, and potentially adding a description of each category explaining what is in/out of scope for it.

mnahkies commented 1 month ago

@SensibleWood do you have any thoughts on this? I'm open to attempting an implementation, but would appreciate some feedback on whether it would be likely to be accepted before investing the effort.

SensibleWood commented 1 week ago

@mnahkies thanks for raising this issue and sorry for the delay in replying. Work on this website has taken a hiatus as there has been other priorities.

I am very open to agreeing an approach and an implementation. There is a need to uplift the repository for Arazzo (which already lives under #157) so now is a good time to rethink categorisation. The original categories and approach was spawned from other initiatives and sources and, whilst it got this site going, needs refinement.

I would suggest we agree a time to talk with voices and take it from there. Thanks again for raising this.

OAI / tools.openapis.org