anowell / are-we-learning-yet

How ready is Rust for Machine Learning?
http://arewelearningyet.com
Creative Commons Attribution 4.0 International
443 stars 62 forks source link

Sparse data category #3

Open maciejkula opened 8 years ago

maciejkula commented 8 years ago

I think it might be worth creating a category for packages that deal with sparse data well --- this is extremely important for all sorts of NLP and recommendation applications, where very large and very sparse matrices are commonplace.

Obviously, rustlearn supports this, or all of its classifiers (including random forests!) :)

anowell commented 8 years ago

I'll be the first to admit that I don't think I got the categories exactly correct (or that I could find any 2 resources that agreed on a way to categorize ML that is useful for describing a language ecosystem)

Would you be willing to draft up the initial category overview and point me to any other crates you think might be candidates for living there? (either in this issue or as a PR)

maciejkula commented 8 years ago

Yes, it's far from clear. I'll have a think and see if I can come up with something.

On 23 Aug 2016 21:52, "Anthony Nowell" notifications@github.com wrote:

I'll be the first to admit that I don't think I got the categories exactly correct (or that I could find any 2 resources that agreed on a way to categorize ML that is useful for describing a language ecosystem)

Would you be willing to draft up the initial category overview and point me to any other crates you think might be candidates for living there? (either in this issue or as a PR)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/anowell/are-we-learning-yet/issues/3#issuecomment-241956900, or mute the thread https://github.com/notifications/unsubscribe-auth/ACSCA2b2Agh7BJMX5qPRDXc4lZnMaaJaks5qi835gaJpZM4JroIC .

bluss commented 7 years ago

possibly look at https://github.com/vbarrielle/sprs too

xpe commented 5 years ago

I'll chime in. I agree that "sparse data support" is a thread that runs through many matrix, ML, and NLP libraries. That said, I'm not sure if it warrants its own category.

I currently lean towards saying "maybe not". Here's two reasons why.

  1. I tend to think of sparse data support as something that ML practitioners tend to look for after they've chosen an approach. Put another way, ML practitioners search for certain primary functionality or capabilities first, and then after look to find sparsity support. (To put it another way, I'm not sure how often a practitioner would say, "I'm only going to choose from ML approaches that already include sparse data support.")

  2. If you we add "sparse data" as a new category, we might be getting into the weeds (i.e. an excessive level of detail). Would we be inviting an explosion of relatively minor categories? Just something to think about.