Simplifying Medicine: Developing an Accessible Health Information Platform

SrijanShovit / HealthLearning

A repo comprising of various Machine Learning and Deep Learning projects in healthcare domain.

34 stars 51 forks source link

Simplifying Medicine: Developing an Accessible Health Information Platform #134

Closed Kshah002 closed 2 weeks ago

Kshah002 commented 1 month ago

Is your feature request related to a problem? Please describe.

Not everyone is familiar with all diseases and the medical terminology used by professionals. Therefore, an interface should be created to address this issue.

Describe the solution you'd like

To develop an LLM model trained on a dataset encompassing various medical terms and diseases, the fundamental concept is for users to inquire about specific diseases and receive relevant, accurate responses. I aim to leverage a pre-trained model, such as LLaMA 2, using Hugging Face.

Describe alternatives you've considered

No response

Additional context

No response

Code of Conduct

[X] I agree to follow this project's Code of Conduct

github-actions[bot] commented 1 month ago

Congratulations, @Kshah002! 🎉 Thank you for creating your issue. Your contribution is greatly appreciated and we look forward to working with you to resolve the issue. Keep up the great work!

We will promptly review your changes and offer feedback. Keep up the excellent work! Kindly remember to check our contributing guidelines

Kshah002 commented 1 month ago

dataset that can be used - https://huggingface.co/datasets/gamino/wiki_medical_terms

Kshah002 commented 1 month ago

hi @SrijanShovit I have raised the issue and have worked with similar project using LLama. So I would request to add the required labels and assign the task to me

SrijanShovit commented 1 month ago

Hmm....that looks cool. What are your detailed steps?

Kshah002 commented 1 month ago

I wanted to try thus have already started working on the project so here is a general idea of how i am to finetune the llama2 model -

First step was to get dataset according to llama2 format however i got the formatted dataset from huggingface itself - https://huggingface.co/datasets/aboonaji/wiki_medical_terms_llam2_format Using 4-bit quantization while loading the pre-trained LLAma2 model - https://huggingface.co/aboonaji/llama2finetune-v2 To use 4-bit weights, with float16 for computation, and specifying the quantization type as nf4 Loading a tokenizer compatible with the LLaMA model, setting the pad token Will be using peft for fine tuning

I am using google colab so, the parameters are considered keeping that in mind. I hope you get the gist of what I am trying to do

Kshah002 commented 1 month ago

@SrijanShovit Hii there. Any updates ??

SrijanShovit commented 4 weeks ago

Yes looks good. Do make documentation along with your minor code steps and keep committing in a single PR.

github-actions[bot] commented 2 weeks ago

This issue has been automatically closed because it has been inactive for more than 7 days. If you believe this is still relevant, feel free to reopen it or create a new one. Thank you!

Kshah002 commented 2 weeks ago

Hey @SrijanShovit can you help me reopen the issue ? I am done with my task and was about to upload and saw the issue is closed. Can you help me out here ? Sorry for being a bit late though.

Or should I raise a new issue ?

Kshah002 commented 2 weeks ago

Hi @SrijanShovit any updates ??