Niketkumardheeryan / ML-CaPsule

ML-capsule is a Project for beginners and experienced data science Enthusiasts who don't have a mentor or guidance and wish to learn Machine learning. Using our repo they can learn ML, DL, and many related technologies with different real-world projects and become Interview ready.
MIT License
406 stars 337 forks source link

[FEATURE]: Consumer Complaint Dataset: Leveraging an NLP Pipeline to Analyze Financial Consumer Complaints #1125

Open ananas304 opened 5 days ago

ananas304 commented 5 days ago

Issue Description

This issue involves creating a new folder for the Consumer Complaint Dataset: Leveraging an NLP Pipeline to Analyze Financial Consumer Complaints project. The folder will include the dataset, a Jupyter Notebook (.ipynb), and a README.md file. The documentation needs to be added to explain the project’s purpose, dataset preprocessing, and important features like consolidated categories.

Suggested Change

The following actions will be taken:

  1. Create a new folder named Consumer Complaint Dataset under the appropriate directory.
  2. Add the dataset file that was downloaded and preprocessed from the Consumer Financial Protection Bureau (CFPB) website.
  3. Include a Jupyter Notebook (.ipynb) that demonstrates the analysis, preprocessing, and any implemented NLP pipeline tasks such as classification or topic modeling.
  4. Add a README.md file that includes the following sections:
    • Project Title and Description: Explain the purpose of the dataset and the NLP pipeline tasks.
    • Dataset Overview: Details about the source and preprocessing steps, filtering records to include "Consumer complaint narrative," and renaming the column to "narrative."
    • Category Consolidation: Document the merging of 18 original product categories into the product_5 variable with five main categories.
    • Running the Jupyter Notebook: Instructions on how to set up the environment, load the dataset, and run the code.
    • Visualizations: Include graphs showing the distribution of the original and consolidated categories.
    • Potential Uses: Outline possible NLP tasks (classification, sentiment analysis, topic modeling) using the dataset.

Rationale

The addition of this folder and the corresponding files is crucial for organizing the project, ensuring that all required materials are available for contributors and users. The README file will provide detailed documentation, making it easier for others to understand the dataset, preprocessing steps, and how to utilize the .ipynb file for NLP analysis. This will improve the project's usability and transparency, enhancing collaboration and further development.

github-actions[bot] commented 5 days ago

Thanks for creating the issue,Please read the Pinned issued first and Readme.md in each Pull Request you made. Keep learning...