GabrielEValenzuela / chatML

A web API exposing a neural network to detect duplicate entities in knowledge graphs. It uses API key authentication and rate limits requests based on client tiers (FREEMIUM, PREMIUM)
MIT License
0 stars 0 forks source link

Define and implement ML folder schema #6

Open GabrielEValenzuela opened 4 weeks ago

GabrielEValenzuela commented 4 weeks ago

Description

Create a structured, organized layout for the machine learning (ML) portion of the project within the src directory. This structure should adhere to Cookiecutter Data Science standards, supporting modularity, scalability, and separation of concerns. The organization should include folders for data processing, modeling, and results, following best practices for ML workflows.

[!NOTE] Adapt to this project, probably we are not going to use all folders

User Stories


Details


Required Directory and File Structure

src/
├── ml/                        # Main ML project directory
│   ├── LICENSE                # License for the ML project
│   ├── Makefile               # Commands for ML workflows, e.g., `make data`, `make train`
│   ├── README.md              # Overview of the ML project and setup instructions
│   │
│   ├── data/                  # Data directory with raw, interim, processed datasets
│   │   ├── external/          # Data from third-party sources
│   │   ├── interim/           # Intermediate, transformed data
│   │   ├── processed/         # Final datasets ready for modeling
│   │   └── raw/               # Original datasets (unaltered)
│   │
│   ├── docs/                  # Documentation, e.g., model descriptions, notes
│   │
│   ├── models/                # Trained models, serialized versions, and model summaries
│   │
│   ├── notebooks/             # Jupyter notebooks for EDA, model experimentation
│   │   └── 1.0-your_initials-description.ipynb
│   │
│   ├── pyproject.toml         # Project configuration and dependencies for ML tools
│   │
│   ├── references/            # Manuals, data dictionaries, and relevant references
│   │
│   ├── reports/               # Analysis reports in HTML, PDF, etc.
│   │   └── figures/           # Figures for analysis and reports
│   │
│   ├── requirements.txt       # Dependencies for the ML environment
│   │
│   └── {{ module_name }}/     # Core Python module for ML logic
│       ├── __init__.py        # Initializes {{ module_name }} as a package
│       ├── config.py          # ML-specific configurations and constants
│       ├── dataset.py         # Script for data download, transformation, and loading
│       ├── features.py        # Functions for feature engineering and selection
│       ├── modeling/          # Model training and prediction
│       │   ├── __init__.py
│       │   ├── train.py       # Model training functions
│       │   └── predict.py     # Model inference functions
│       └── plots.py           # Visualization functions for model insights

Examples and Notes

Edge Cases