abhisheks008 / DL-Simplified

Deep Learning Simplified is an Open-source repository, containing beginner to advance level deep learning projects for the contributors, who are willing to start their journey in Deep Learning. Devfolio URL, https://devfolio.co/projects/deep-learning-simplified-f013
https://quine.sh/repo/abhisheks008-DL-Simplified-499023976
MIT License
389 stars 340 forks source link

Microsoft Malware Prediction #795

Open somaiaahmed opened 5 months ago

somaiaahmed commented 5 months ago

🔴 Project Title: Microsoft Malware Prediction Challenge

🔴 Aim: Develop predictive models using data science techniques to anticipate malware attacks on machines, thereby preventing potential damage to Microsoft's vast user base.

🔴 Dataset: Utilize the unprecedented malware dataset provided by Microsoft to facilitate open-source advancements in malware prediction techniques.

🔴 Approach: Perform exploratory data analysis (EDA) on the malware dataset to understand its structure and characteristics. Implement 3-4 machine learning algorithms such as Random Forest, XGBoost, Neural Networks, and others. Compare these algorithms based on their performance metrics such as accuracy, precision, and recall to identify the most effective model for predicting malware occurrences.


📍 Follow the Guidelines to Contribute in the Project:


🔴🟡 Points to Note:


To be Mentioned while taking the issue:


Happy Contributing! 🚀

All the best. Enjoy your open source journey ahead. 😎

github-actions[bot] commented 5 months ago

Thank you for creating this issue! We'll look into it as soon as possible. Your contributions are highly appreciated! 😊

somaiaahmed commented 5 months ago

@abhisheks008 , 👋 Hey bro can you please assign me this issue under GSSoC'24 with an appropriate level tag

Nidhi-Satyapriya commented 5 months ago

@abhisheks008 , kindly assign this isssue to me with an appropriate level tag

abhisheks008 commented 5 months ago

@abhisheks008 , 👋 Hey bro can you please assign me this issue under GSSoC'24 with an appropriate level tag

What are the models you are planning for this problem statement? Mention at least 3-4 models for this dataset.

somaiaahmed commented 5 months ago

@abhisheks008 I'm planning to use Gradient Boosting Machines (GBM)

For tabular data like the one in this malware prediction challenge, tree-based ensemble methods (XGBoost, LightGBM, CatBoost) are often the most effective. These methods can handle the complexity and variability in the data well.

abhisheks008 commented 5 months ago

@abhisheks008 I'm planning to use Gradient Boosting Machines (GBM)

For tabular data like the one in this malware prediction challenge, tree-based ensemble methods (XGBoost, LightGBM, CatBoost) are often the most effective. These methods can handle the complexity and variability in the data well.

Hi @somaiaahmed thanks for the approach. But this project repository demands deep learning models instead of machine learning models, hence can you please upgrade your approach and get back to this issue?

somaiaahmed commented 5 months ago

@abhisheks008 ok i can build CNN model plz assign it to me

abhisheks008 commented 5 months ago

@abhisheks008 ok i can build CNN model plz assign it to me

Can you brief more on the planned the models? Only CNN will not work here as you need to implement at least 2-3 models for any project.

Basma2423 commented 5 months ago

@abhisheks008, I can start working on it, after making sure you approve my solution for the Micromobility-Lane-Recognition Issue

Full name: Basma Mahmoud GitHub Profile Link: Basma2423 Email ID: mayarbasma2423@gmail.com

Approach for this Project:

  1. Data Loading and Preprocessing
  2. EDA
  3. Models: 3.1 Multiple Deep Learning approaches suitable for tabular data, e.g: FNN, TabNet, and Entity Embeddings for Categorical Variables. 3.2 Maybe some pre-trained models, e.g. Pretrained TabNet, PyCaret, and AutoGluon.
  4. Models Assessment.

What is your participant role? (Mention the Open Source program): GSSoC-2024 participant

Can you add the label for GSSoC, please? Thanks.

abhisheks008 commented 4 months ago

@abhisheks008, I can start working on it, after making sure you approve my solution for the Micromobility-Lane-Recognition Issue

Full name: Basma Mahmoud GitHub Profile Link: Basma2423 Email ID: mayarbasma2423@gmail.com

Approach for this Project:

  1. Data Loading and Preprocessing
  2. EDA
  3. Models: 3.1 Multiple Deep Learning approaches suitable for tabular data, e.g: FNN, TabNet, and Entity Embeddings for Categorical Variables. 3.2 Maybe some pre-trained models, e.g. Pretrained TabNet, PyCaret, and AutoGluon.
  4. Models Assessment.

What is your participant role? (Mention the Open Source program): GSSoC-2024 participant

Can you add the label for GSSoC, please? Thanks.

As this issue is raised by a contributor, I can't assign this to you

Basma2423 commented 4 months ago

@abhisheks008 no probs.