algovengers / MindMate

https://mind-mate-wellness.vercel.app/
20 stars 42 forks source link

Adding a custom knowledge base to gemini #44

Open VigneshGadhari opened 2 months ago

VigneshGadhari commented 2 months ago

I'm interested in taking up the opportunity of adding a custom knowledge base/train the RAG on a few custom datasets to help improve that chatbot's response. Under GSSOC'24, I would like to work on this issue. @afeefuddin @algovengers

subharthihazra commented 2 months ago

Go on. Assigned.

afeefuddin commented 2 months ago

Provide datasets before training them, plus you will need the access of VectorDB so train it in your temporary vector Database and share with us about it's efficiency. We will probably then make embeddings in the Production DB.

This can be a good Issue to Contribute to the project if done properly and in a well planned manner.

afeefuddin commented 2 months ago

Any updates @VigneshGadhari?

VigneshGadhari commented 2 months ago

Hey, I've been a bit occupied with university exams, but I found a dataset on Kaggle. I'll send it by tomorrow and will also share any better ones I come across. Should I send them here, or would you prefer we connect elsewhere?

VigneshGadhari commented 2 months ago

Can be integrated: https://www.kaggle.com/datasets/elvis23/mental-health-conversational-data https://www.kaggle.com/datasets/narendrageek/mental-health-faq-for-chatbot

This one is a bit too complex (highly unlikely that it can be integrated): https://huggingface.co/datasets/nbertagnolli/counsel-chat?row=3

Not sure about these: Suicide Severity: Dataset for labeling suicidality posts with longitudinal information, using CSSRS questionnaire. Public access — https://zenodo.org/record/4543776/export/csl Primate2022: Dataset for labeling depression-related posts using the PHQ-9 questionnaire. https://github.com/primate-mh/Primate2022

I'm not sure if I could integrate all of these, can only tell once I get started with the process.

VigneshGadhari commented 1 month ago

Shall i go ahead with these datasets @afeefuddin ?