Open VigneshGadhari opened 2 months ago
Go on. Assigned.
Provide datasets before training them, plus you will need the access of VectorDB so train it in your temporary vector Database and share with us about it's efficiency. We will probably then make embeddings in the Production DB.
This can be a good Issue to Contribute to the project if done properly and in a well planned manner.
Any updates @VigneshGadhari?
Hey, I've been a bit occupied with university exams, but I found a dataset on Kaggle. I'll send it by tomorrow and will also share any better ones I come across. Should I send them here, or would you prefer we connect elsewhere?
Can be integrated: https://www.kaggle.com/datasets/elvis23/mental-health-conversational-data https://www.kaggle.com/datasets/narendrageek/mental-health-faq-for-chatbot
This one is a bit too complex (highly unlikely that it can be integrated): https://huggingface.co/datasets/nbertagnolli/counsel-chat?row=3
Not sure about these: Suicide Severity: Dataset for labeling suicidality posts with longitudinal information, using CSSRS questionnaire. Public access — https://zenodo.org/record/4543776/export/csl Primate2022: Dataset for labeling depression-related posts using the PHQ-9 questionnaire. https://github.com/primate-mh/Primate2022
I'm not sure if I could integrate all of these, can only tell once I get started with the process.
Shall i go ahead with these datasets @afeefuddin ?
I'm interested in taking up the opportunity of adding a custom knowledge base/train the RAG on a few custom datasets to help improve that chatbot's response. Under GSSOC'24, I would like to work on this issue. @afeefuddin @algovengers