anowell / are-we-learning-yet

How ready is Rust for Machine Learning?
http://arewelearningyet.com
Creative Commons Attribution 4.0 International
453 stars 61 forks source link

ML categories on crates.io and automatically collecting crates from registry #120

Open elpiel opened 2 years ago

elpiel commented 2 years ago

Last year I've added a few categories to crates.io related to aerospace (https://github.com/rust-lang/crates.io/issues/4105). I think it's a feature that's not used too much and you can see that there aren't a lot of categories added anyway.

In this line of thought, I can suggest having a whiltelist & blacklist per category where we can add addition or exclude given categories.

We're looking to implement this approach for https://areweinspaceyet.org which will become the Aerorust website and current WIP of the new website can be found at https://github.com/AeroRust/AeroRust.github.io/

anowell commented 2 years ago

What categories could we add to crates.io for ML? Maybe the ones that are defined on the website?

I'm not sure. I don't know if we even have the right set of categories currently (though, it has been a couple years since the last suggestion to tweak the categories). I wonder if the categorization will change a bit as the ecosystem matures. And would it even make sense for crates.io to create some of the more specific categories. Consider "data structures" vs "ML data structures" or "Science" vs "Scientific Computing", or might there be GPU programming crates that are specific to games vs ML vs other.

I'd suggest starting with categories that are clearly ML-specific and well-defined like Neural Networks and/or NLP.

Would you consider building the crates lists automatically using provided categories?

yes. But first and foremost, I think AWLY should prioritize finding the best way to organize and surface crates in the ML ecosystem over aligning with crates.io categorization (the latter being good if not at the expense of the former).

To that end, if such crates.io categories were created, it'd be fairly easy to fetch crates from a category. For the implementation, I'd want to see:

I think initially some category map file could also blacklist crates, but if that list grows large or needs updated regularly, I think we'd want to consider some heuristic to filter on instead. Being an ML ecosystem, it would be awesome if we had a classifier that could help filter crates, but I'll refrain from over-engineering a solution to a problem we don't yet have.