aus10powell / MIT-Fishery-Counter

Applying Image Recognition to Enhance Fisheries Management Capabilities
4 stars 1 forks source link

Investigate Data and Model Storage/Organization (MLOps) related to gold standard datasets #6

Open aus10powell opened 2 weeks ago

aus10powell commented 2 weeks ago

Investigation: Data and Model Storage/Organization (MLOps) for Gold Standard Datasets

Overview

We need to investigate and improve the storage and organization of our data and models, particularly focusing on our gold standard datasets. This will help streamline our MLOps processes, ensuring that our data and models are easily accessible, well-organized, and efficiently managed.

Objectives

  1. Assess Current Storage and Organization:

    • Review the current structure and organization of our gold standard datasets and models.
    • Identify any issues or inefficiencies in the current setup.
  2. Define Best Practices:

    • Research and define best practices for data and model storage/organization in MLOps

    • IMG_3501

    • Consider aspects such as versioning, metadata management, and accessibility.

  3. Implement Improvements:

    • Propose and implement improvements to our current storage and organization setup.
    • Ensure that the new setup adheres to the defined best practices.
  4. Documentation:

    • Document the new storage and organization structure.
    • Provide guidelines for maintaining the new setup.

Current Structure

The current structure of our repository is as follows:

code/
    archived/
    notebooks/
    src/
    tests/
data/
    datasets/
    gold_dataset/
    gold_models/
    historical_csv_files/
    inference_data/
    ...
documentation/
    ...

Key Areas to Investigate

  1. Data Storage:

    • Location and organization of gold standard datasets.
    • Versioning and metadata management for datasets.
    • Accessibility and permissions for data access.
  2. Model Storage:

    • Location and organization of trained models.
    • Versioning and metadata management for models.
    • Accessibility and permissions for model access.
  3. MLOps Tools and Practices:

    • Tools and practices for managing data and models in MLOps.
    • Integration with existing MLOps pipelines and workflows.

Proposed Steps

  1. Review Current Setup:

    • Conduct a thorough review of the current data and model storage/organization.
    • Identify any gaps or areas for improvement.
  2. Research Best Practices:

    • Research best practices for data and model storage/organization in MLOps.
    • Consider tools and frameworks that can help manage data and models effectively.
  3. Propose Improvements:

    • Based on the review and research, propose improvements to the current setup.
    • Create a detailed plan for implementing the proposed improvements.
  4. Implement and Document:

    • Implement the proposed improvements.
    • Document the new storage and organization structure.
    • Provide guidelines for maintaining the new setup.

Expected Outcomes

References

Tasks