The-Data-Alchemists-Manipal / MindWave

MindWave is an open-source project designed for beginners to learn about data science, machine learning, deep learning, and reinforcement learning algorithms using Python. The project offers a platform for implementing relevant algorithms, with open-source tools and libraries.
MIT License
97 stars 145 forks source link

Duplicate Question Pairs - NLP #70

Closed MohnishK7 closed 1 year ago

MohnishK7 commented 1 year ago

💥 Proposal

Hello @khusheekapoor, I have a dataset consisting of question pairs, and I need to classify whether the given questions are duplicates or not. example: who is the president of India? who is the current president of India? here we have to find the duplication in the question. Quora dataset will be used for building up this project.

I want to work on this issue. I am a GSSOC'23 contributor

Jyotsna1304 commented 1 year ago

I am a GSSOC'23 contributor. Can you please assign me this issue

khusheekapoor commented 1 year ago

@MohnishK7 - which algorithms will you be using?

@Jyotsna1304 - since we are following the first-come-first-serve policy, we will not be able to assign you this issue. However, you can create another issue on the similar lines.

MohnishK7 commented 1 year ago

@khusheekapoor Ya so at first instance I will be using BOW(bag of words)-NLP(a statistical language model used to analyze text and documents based on word count), Random Forest, and XGBoost to get certain the accuracy and then I will improve the accuracy generate by above algorithms.

khusheekapoor commented 1 year ago

@MohnishK7 - you can go ahead! We are assigning you 21 days for this project, after which it will be assigned to someone else if not completed. All the best! Name the file as: algorithm_dataset.ipynb and link it in the readme of the labeled directory as algorithm - dataset.

MohnishK7 commented 1 year ago

Thank you so much @khusheekapoor . I'm ready to begin the project today.