cBioPortal / GSoC

Documentation repository of Google Summer of Code (GSoC) project ideas for cBioPortal and related projects
108 stars 42 forks source link

Create Chat Bot Interface Trained On Documentation Site #102

Open inodb opened 1 year ago

inodb commented 1 year ago

Background:

Goal:

Approach:

Need skills: Familiarity with the command line and the use of APIs

Possible mentors: @inodb @walleXD

Praashh commented 1 year ago

Hey @inodb I think my skills are similiar as the project can you assign me ?

priyanshiaroraaa commented 1 year ago

I know I am new to this but I am currently building a personal virtual assistant in python language for my minor project 2 in college and I have a good command in Java too. I am an AIML student, the knowledge of which will help me train the model for your chatbot. I have good command in python, Java, Machine learning, NLP and AI algorithms. Since I am currently working on my minor 2 project right now and it is not completed, I am attaching my documentation till now and the code till now for reference MINOR.docx synopsis presentation short.pptx Software Requirements Specification.docx

kamranayesh commented 1 year ago

Hi! Iā€™m Kamran Ayesh, a CSE final student at Indian Institute of Information Technology Guwahati, India. I have written a well explained proposal for chatbot interface trained on documentation site. I am hoping for feedback or any queries from you soon. I am very well suited for contributing to this project as during my internship I have made a virtual assistant with robust UI. Being a developer this project will enhance my skills and give better exposure to open-source.

Looking forward to contributing!

Thanks, Kamran Ayesh

Nisarg908 commented 1 year ago

I'm interested in helping to build a chatbot, I am Nisarg Patel, a CSE 2nd year university student I would like to contribute in building this chatbot. I am new at this but I am ready to learn and help for the cause and this will help me improve.

Looking forward for your response!

Thanks, Nisarg Patel

JamesAlaric commented 1 year ago

Hello i'm interested in helping to develope this chatbot. How can i apply as gsoc contributor? plzzz

ViditJain123 commented 10 months ago

Hey.. is this thing done or not? Igave good experience with making chatbots and also I am good with mern stack, so I can even integrate it with your website

On Sat, 18 Nov 2023, 6:50 am j4m3s 4l4r1c, @.***> wrote:

Euh... Sorry but what are you talking about?

On Fri, Nov 17, 2023, 15:43 Vidit Jain @.***> wrote:

Hey.. is this thing done? If not I can still make it.

ā€” Reply to this email directly, view it on GitHub https://github.com/cBioPortal/GSoC/issues/102#issuecomment-1816553428,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/APVQMSVFPM3FIDWXJTXIFK3YE5Z2JAVCNFSM6AAAAAAWPU2LPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJWGU2TGNBSHA>

. You are receiving this because you commented.Message ID: @.***>

ā€” Reply to this email directly, view it on GitHub https://github.com/cBioPortal/GSoC/issues/102#issuecomment-1817307595, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUFRT7MC4JEZ6ZDD4ZLGMR3YFAEPFAVCNFSM6AAAAAAWPU2LPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJXGMYDONJZGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

NehaAr commented 8 months ago

Hi,,,i am working on similar use case for my pipeline..where i am building a chatbot to scrape through the documents in my pipeline..i really would like to solve the above issue

NeuralFlux commented 7 months ago

Hi @inodb , I'm a CS grad at NYU with a solid grasp of ML, PyTorch, and CLI. I've worked on LLMs for zero-shot classification on food ingredient data. I believe using LLMs for retrieval augmented generation is highly applicable to your use-case. How would you advise me to get started on this?

kartheekyakkala commented 7 months ago

Hello @inodb, I feel we can use Retrieval-Augmented Generation (RAG) technique instead of fine tuning or training. Since the documentation or knowledge base gets updated now and then, fine tuning the LLM could be costly. Moreover, RAG technique is more reliable as it has up to date knowledge. I'm a CS grad at UCM with huge interest in LLMs and Generative AI. I would like to work on this issue could you give me some leads?

Steveolas commented 7 months ago

Hey all! I am Ilan, a Data Science grad from the Technion. I would love to contribute to this project.

@inodb As a first step, I wanted to ask if you already thought on how you were going to structure the documentation as data for training. If so, I would love to get am example, If not I think that could be a good step to begin with. Also I would like to know if it's possible to share the documention in some easyto work with format that you might have on the backend. If not, I can just go scraping it straight from the webpage.

Anyway, would love to get some suggestions on what should be the first steps to start getting familiar with the project.

Thanks Ilan Meissonnier

Steveolas commented 7 months ago

BTW The Medium link given as example blog is member only :(. The following blog seems like a pretty similar (hard to tell as couldn't read the original LOL). Hope this is helpful.

Ilan

Steveolas commented 6 months ago

Hey all! I have been thinking about this project a bit and I have some interesting thoughts I'd like to share...

If I was using a chatbot to help me navigate documentation, I would prefer if it would be able to provide me a link to the documentation page where it learned the info from. This way I am able to fact check it and/or read further into the problem I'm having. As we know, LLMs are not always accurate and can sometimes be quite confident even when wrong. While it can be possible to train the chatbot to retrieve a link as well as answer a question (by structuring the training data in such a way), this task might be more simply solved using traditional information retrieval techniques. i.e retrieving the page that best matches a user query from a search bar (I have noticed that the search bar on the documentation webpage is not functional atm). This of course gets more complicated if you want to include answers from the google group conversations, but this approach should definitely be considered. Another option might be trying to combine both approaches together in some way, although we need to decide exactly how to do that.

Would love to hear what everyone thinks about this, or if there might be something I'm missing. Would specifically love to hear your insights on this @inodb.

Sorry for the long post, Ilan Meissonnier

skhavindev commented 6 months ago

Hey!

I am khavin. I am a Artificial Intelligence (AI) student currently pursuing a dual Bachelor of Science in data science at the Indian Institute of Technology Madras (IIT Madras) and Sathyabama Institute of Science and Technology. My have high interest in machine learning ,Artificial intellignce ,Neuromorphic computing

I possess extensive experience working with PyTorch and have successfully built chatbots using Google AI Studio. This has given me some experience on how to train and build chatbots. I think this experience is useful for this application and provide further experience to me on real world applications of AI

Looking forward for open source contributing!

Regards, Khavin S

Steveolas commented 6 months ago

Hey all, I have made a prototype for for a chatbot using RAG. I think RAG could be a pretty good approach for this project. I'm sharing this prototype as a link for a kaggle notebook if you are intrested, be sure to leave any interesting feedback that you may have.

https://www.kaggle.com/code/ilanmeissonnier/rag-for-cbioportal-documentation-chatbot

Ilan Meissonnier

Steveolas commented 6 months ago

I have also came across a research paper that came out a few days ago suggesting a method called Research Augmented Fine Tuning (RAFT). I am still not done reading through it but it already seems like it could be a really good approach for this.

Steveolas commented 6 months ago

Link to the paper šŸ˜