GopherML / bag

Bag of words as code
MIT License
86 stars 0 forks source link

Add more practical examples #23

Open iberflow opened 3 months ago

iberflow commented 3 months ago

Awesome project, I thank you for expanding the Go ecosystem in this area and as per our Reddit discussion I'm putting this down here.

It would be amazing to have real-world/practical training sets/models that ideally would be pretty much plug-and-play or at the very least great starting points like the yes-no you already have.

My immediate use case is identifying whether a user's query is an action request or a discussion, which would allow me to reduce the number of queries to the LLMs and save some money.

But it feels like this project is a good match for various customer support, task management, accounting, automation, development or psychology related text analysis.

I would love to see some real training data for something that you've implemented using this lib and are happy to share. It could be anything, but it's important that it's at least partially practical and a good example to work on.

Some examples:

I'm fairly new to NLP/ML and the sort, so maybe some of these things are better left to LLMs (you be the judge of that), but the more I can offload and have baked into my binary the better :)

itsmontoya commented 1 month ago

It would be amazing to have real-world/practical training sets/models that ideally would be pretty much plug-and-play or at the very least great starting points like the yes-no you already have.

I think this is a fantastic idea. Do you have any ideas or requests of example sets which might be beneficial to you? I'll ask my network to see which ones are most interesting to them as well.

My immediate use case is identifying whether a user's query is an action request or a discussion, which would allow me to reduce the number of queries to the LLMs and save some money.

This is a perfect use-case for this library. This is something that would best perform with training data from your specific application. This would be insanely fast, cheap (free), and scalable.

But it feels like this project is a good match for various customer support, task management, accounting, automation, development or psychology related text analysis.

I think it would manage all these tasks fine. The biggest limiter is training data. A lot of this data would probably be internally sourced. If I'm properly understanding your use-case.

Github issue analyser to pick label issues (bug, feature, ignore, etc) Probably better for a neural network for LLM. It's definitely possible with bag, but you would have to get very creative with what you are serializing and how you are serializing it.

Color shade analyzer to fit a color into a group (teal = green, pink = red or whatever) This is definitely doable!

Tone of voice, urgency identification (maybe this one is easier for LLMs) This is perfect for bag! :)

Task priority rating (bug = 10, feature = 5, feature_with_guaranteed_revenue=5000) I'd say Neural network or LLM would be better for this

Mood (chill, angry, etc). This is perfect for bag! :)

iberflow commented 1 month ago

Hi there!

A lot of this data would probably be internally sourced. If I'm properly understanding your use-case.

To me the key part in all of this is to be able to run a demo of a good starting point. Even if it doesn't present amazing results, for me (and I imagine many other devs getting into ML), it's pretty hard to understand what data works best and how to prepare the training data for specific tools and how to tune the tools for various cases. If I had examples with lets say 1k-5k generic (but on-topic) inputs, it would be waaaay easier to build on top, than to try and figure this out on my own. I agree that a lot of the data would come from internal sources, but we could have tooling (with the help of LLMs) to generalize/anonymize private data sets and publish here, growing the real-world example foundation.

Just food for thought :)