Open abhisheks008 opened 6 months ago
--> Full Name: Nihar Mahesh Jani --> Github Profile Link: https://github.com/NiharJani2002 --> Email id: nihar.j@ahduni.edu.in --> Participant id: https://quine.sh/user/NiharJani2002
--> Approach for this Project :
Load and clean the data: Address missing values or inconsistencies. Verify data quality and integrity. Understand the distribution of features: Visualize word frequencies, token lengths, and other relevant metrics. Explore potential correlations or patterns. Identify potential biases or limitations: Assess domain specificity or distributional biases. Consider ethical implications and responsible use.
Choose 3-4 suitable algorithms: Recurrent Neural Networks (RNNs): LSTM or GRU architectures for sequential modeling. Transformer-based models: BERT, DistilBERT, or ALBERT for masked language modeling. Contextual embedding models: Word2Vec or GloVe for word-level representations. Other potential options: Attention-based RNNs, CNN-LSTM hybrids. Implement each model: Leverage libraries like TensorFlow, PyTorch, or Hugging Face Transformers. Carefully consider hyperparameter tuning and optimization strategies.
Split the dataset: Training set for model learning. Validation set for hyperparameter tuning and model selection. Test set for final evaluation. Train each model: Monitor training progress and adjust hyperparameters as needed. Evaluate performance: Use accuracy scores as a primary metric. Consider additional metrics like precision, recall, F1-score, perplexity, or ROUGE scores based on specific analysis goals. Analyze error patterns to identify areas for improvement.
Compare accuracy scores and other relevant metrics across models. Consider model complexity, training time, and interpretability. Select the best-performing model based on a comprehensive assessment.
Experiment with different hyperparameters and model architectures. Explore techniques like transfer learning or fine-tuning to leverage pre-trained models. Consider using attention mechanisms for better interpretability. Visualize model outputs and attention weights for qualitative insights. Incorporate error analysis and feedback loops to improve model performance iteratively. Document the process, findings, and conclusions rigorously for reproducibility and knowledge sharing.
-->What is your participant role? (Mention the Open Source program): SWOC S4
Issue assigned to you @NiharJani2002
When I clone the DL-Simplified repo on my local pc, there are many sub-folders with different project names, Whom I Have to work on? Or I have to start new ?
@abhisheks008
You have to create your own project, you don't need to update in the existing ones.
Full name :Titiksha Agrawal GitHub Profile Link : https://www.github.com/AgrawalTitiksha/ Email ID : agrawaltn2311@gmail.com Participant ID (if applicable): Approach for this Project : Conducting exploratory data analysis on a synthetic dataset of queries related to computer vision, preprocessing the text data using NLP techniques and BERT, building and comparing the performance of 3 DL algorithms viz; LSTM, CNN, and Transformer, and selecting the best-fitted algorithm based on accuracy scores to enhance MLM model effectiveness. What is your participant role? (Mention the Open Source program) GSSOC'24 Contributor
Assigned to you @AgrawalTitiksha
Hello @abhisheks008 I am a GsSOC'24 contributor and I would like to work on this issue. Can you please assign it to me?
@khushi-igupta already assigned to a contributor.
Deep Learning Simplified Repository (Proposing new issue)
:red_circle: Project Title : MLM Analysis of ChatGPT using NLP :red_circle: Aim : The aim of this project is to analyze the MLM model using NLP methods. :red_circle: Dataset : https://www.kaggle.com/datasets/pe4eniks/chatgpt-for-mlm :red_circle: Approach : Try to use 3-4 algorithms to implement the models and compare all the algorithms to find out the best fitted algorithm for the model by checking the accuracy scores. Also do not forget to do a exploratory data analysis before creating any model.
π Follow the Guidelines to Contribute in the Project :
requirements.txt
- This file will contain the required packages/libraries to run the project in other machines.Model
folder, theREADME.md
file must be filled up properly, with proper visualizations and conclusions.:red_circle::yellow_circle: Points to Note :
:white_check_mark: To be Mentioned while taking the issue :
Happy Contributing π
All the best. Enjoy your open source journey ahead. π