Samagra-Development / ai-tools

AI Tooling to bootstrap applications fast
44 stars 110 forks source link

Reduce docker size #154

Closed Gautam-Rajeev closed 10 months ago

Gautam-Rajeev commented 1 year ago


Reduce the Docker image size of the ai-tools repository by optimizing the Dockerfile and dependencies.


Expected Outcome

The Docker image for the ai-tools repository should have a reduced size while still maintaining all the necessary dependencies and configurations.

Acceptance Criteria

Implementation Details

Mockups / Wireframes

Dependancy graph: 
aiohttp 3.8.4 Async http client/server framework (asyncio)
├── aiosignal >=1.1.2
│   └── frozenlist >=1.1.0 
├── async-timeout >=4.0.0a3,<5.0
├── attrs >=17.3.0
├── charset-normalizer >=2.0,<4.0
├── frozenlist >=1.1.1
├── multidict >=4.5,<7.0
└── yarl >=1.0,<2.0
    ├── idna >=2.0 
    └── multidict >=4.0 
async-cache 1.1.1 An asyncio Cache
asyncio 3.4.3 reference implementation of PEP 3156
en-coreference-web-trf 3.4.0a2 English transformer pipeline to demonstrate the experimental coreference components. Components: sentencizer, transfomer, coref, span_resolver, span_cleaner.
├── spacy >=3.3.0,<3.5.0
│   ├── catalogue >=2.0.6,<2.1.0 
│   ├── cymem >=2.0.2,<2.1.0 
│   ├── jinja2 * 
│   │   └── markupsafe >=2.0 
│   ├── langcodes >=3.2.0,<4.0.0 
│   ├── murmurhash >=0.28.0,<1.1.0 
│   ├── numpy >=1.15.0 
│   ├── packaging >=20.0 
│   ├── pathy >=0.3.5 
│   │   ├── smart-open >=5.2.1,<7.0.0 
│   │   └── typer >=0.3.0,<1.0.0 
│   │       └── click >=7.1.1,<9.0.0 
│   │           └── colorama * 
│   ├── preshed >=3.0.2,<3.1.0 
│   │   ├── cymem >=2.0.2,<2.1.0 (circular dependency aborted here)
│   │   └── murmurhash >=0.28.0,<1.1.0 (circular dependency aborted here)
│   ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<1.11.0 
│   │   └── typing-extensions >=4.2.0 
│   ├── requests >=2.13.0,<3.0.0 
│   │   ├── certifi >=2017.4.17 
│   │   ├── charset-normalizer >=2,<4 
│   │   ├── idna >=2.5,<4 
│   │   └── urllib3 >=1.21.1,<1.27 
│   ├── setuptools * 
│   ├── smart-open >=5.2.1,<7.0.0 (circular dependency aborted here)
│   ├── spacy-legacy >=3.0.10,<3.1.0 
│   ├── spacy-loggers >=1.0.0,<2.0.0 
│   ├── srsly >=2.4.3,<3.0.0 
│   │   └── catalogue >=2.0.3,<2.1.0 (circular dependency aborted here)
│   ├── thinc >=8.1.0,<8.2.0 
│   │   ├── blis >=0.7.8,<0.8.0 
│   │   │   └── numpy >=1.15.0 (circular dependency aborted here)
│   │   ├── catalogue >=2.0.4,<2.1.0 (circular dependency aborted here)
│   │   ├── confection >=0.0.1,<1.0.0 
│   │   │   ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<1.11.0 (circular dependency aborted here)
│   │   │   └── srsly >=2.4.0,<3.0.0 (circular dependency aborted here)
│   │   ├── cymem >=2.0.2,<2.1.0 (circular dependency aborted here)
│   │   ├── murmurhash >=1.0.2,<1.1.0 (circular dependency aborted here)
│   │   ├── numpy >=1.15.0 (circular dependency aborted here)
│   │   ├── packaging >=20.0 (circular dependency aborted here)
│   │   ├── preshed >=3.0.2,<3.1.0 (circular dependency aborted here)
│   │   ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<1.11.0 (circular dependency aborted here)
│   │   ├── setuptools * (circular dependency aborted here)
│   │   ├── srsly >=2.4.0,<3.0.0 (circular dependency aborted here)
│   │   └── wasabi >=0.8.1,<1.2.0 
│   ├── tqdm >=4.38.0,<5.0.0 
│   │   └── colorama * (circular dependency aborted here)
│   ├── typer >=0.3.0,<0.8.0 (circular dependency aborted here)
│   └── wasabi >=0.9.1,<1.1.0 (circular dependency aborted here)
├── spacy-experimental >=0.6.1,<0.7.0
│   └── spacy >=3.3.0,<3.6.0 
│       ├── catalogue >=2.0.6,<2.1.0 
│       ├── cymem >=2.0.2,<2.1.0 
│       ├── jinja2 * 
│       │   └── markupsafe >=2.0 
│       ├── langcodes >=3.2.0,<4.0.0 
│       ├── murmurhash >=0.28.0,<1.1.0 
│       ├── numpy >=1.15.0 
│       ├── packaging >=20.0 
│       ├── pathy >=0.3.5 
│       │   ├── smart-open >=5.2.1,<7.0.0 
│       │   └── typer >=0.3.0,<1.0.0 
│       │       └── click >=7.1.1,<9.0.0 
│       │           └── colorama * 
│       ├── preshed >=3.0.2,<3.1.0 
│       │   ├── cymem >=2.0.2,<2.1.0 (circular dependency aborted here)
│       │   └── murmurhash >=0.28.0,<1.1.0 (circular dependency aborted here)
│       ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<1.11.0 
│       │   └── typing-extensions >=4.2.0 
│       ├── requests >=2.13.0,<3.0.0 
│       │   ├── certifi >=2017.4.17 
│       │   ├── charset-normalizer >=2,<4 
│       │   ├── idna >=2.5,<4 
│       │   └── urllib3 >=1.21.1,<1.27 
│       ├── setuptools * 
│       ├── smart-open >=5.2.1,<7.0.0 (circular dependency aborted here)
│       ├── spacy-legacy >=3.0.10,<3.1.0 
│       ├── spacy-loggers >=1.0.0,<2.0.0 
│       ├── srsly >=2.4.3,<3.0.0 
│       │   └── catalogue >=2.0.3,<2.1.0 (circular dependency aborted here)
│       ├── thinc >=8.1.0,<8.2.0 
│       │   ├── blis >=0.7.8,<0.8.0 
│       │   │   └── numpy >=1.15.0 (circular dependency aborted here)
│       │   ├── catalogue >=2.0.4,<2.1.0 (circular dependency aborted here)
│       │   ├── confection >=0.0.1,<1.0.0 
│       │   │   ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<1.11.0 (circular dependency aborted here)
│       │   │   └── srsly >=2.4.0,<3.0.0 (circular dependency aborted here)
│       │   ├── cymem >=2.0.2,<2.1.0 (circular dependency aborted here)
│       │   ├── murmurhash >=1.0.2,<1.1.0 (circular dependency aborted here)
│       │   ├── numpy >=1.15.0 (circular dependency aborted here)
│       │   ├── packaging >=20.0 (circular dependency aborted here)
│       │   ├── preshed >=3.0.2,<3.1.0 (circular dependency aborted here)
│       │   ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<1.11.0 (circular dependency aborted here)
│       │   ├── setuptools * (circular dependency aborted here)
│       │   ├── srsly >=2.4.0,<3.0.0 (circular dependency aborted here)
│       │   └── wasabi >=0.8.1,<1.2.0 
│       ├── tqdm >=4.38.0,<5.0.0 
│       │   └── colorama * (circular dependency aborted here)
│       ├── typer >=0.3.0,<0.8.0 (circular dependency aborted here)
│       └── wasabi >=0.9.1,<1.1.0 (circular dependency aborted here)
└── spacy-transformers >=1.1.2,<1.2.0
    ├── spacy >=3.4.0,<4.0.0 
    │   ├── catalogue >=2.0.6,<2.1.0 
    │   ├── cymem >=2.0.2,<2.1.0 
    │   ├── jinja2 * 
    │   │   └── markupsafe >=2.0 
    │   ├── langcodes >=3.2.0,<4.0.0 
    │   ├── murmurhash >=0.28.0,<1.1.0 
    │   ├── numpy >=1.15.0 
    │   ├── packaging >=20.0 
    │   ├── pathy >=0.3.5 
    │   │   ├── smart-open >=5.2.1,<7.0.0 
    │   │   └── typer >=0.3.0,<1.0.0 
    │   │       └── click >=7.1.1,<9.0.0 
    │   │           └── colorama * 
    │   ├── preshed >=3.0.2,<3.1.0 
    │   │   ├── cymem >=2.0.2,<2.1.0 (circular dependency aborted here)
    │   │   └── murmurhash >=0.28.0,<1.1.0 (circular dependency aborted here)
    │   ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<1.11.0 
    │   │   └── typing-extensions >=4.2.0 
    │   ├── requests >=2.13.0,<3.0.0 
    │   │   ├── certifi >=2017.4.17 
    │   │   ├── charset-normalizer >=2,<4 
    │   │   ├── idna >=2.5,<4 
    │   │   └── urllib3 >=1.21.1,<1.27 
    │   ├── setuptools * 
    │   ├── smart-open >=5.2.1,<7.0.0 (circular dependency aborted here)
    │   ├── spacy-legacy >=3.0.10,<3.1.0 
    │   ├── spacy-loggers >=1.0.0,<2.0.0 
    │   ├── srsly >=2.4.3,<3.0.0 
    │   │   └── catalogue >=2.0.3,<2.1.0 (circular dependency aborted here)
    │   ├── thinc >=8.1.0,<8.2.0 
    │   │   ├── blis >=0.7.8,<0.8.0 
    │   │   │   └── numpy >=1.15.0 (circular dependency aborted here)
    │   │   ├── catalogue >=2.0.4,<2.1.0 (circular dependency aborted here)
    │   │   ├── confection >=0.0.1,<1.0.0 
    │   │   │   ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<1.11.0 (circular dependency aborted here)
    │   │   │   └── srsly >=2.4.0,<3.0.0 (circular dependency aborted here)
    │   │   ├── cymem >=2.0.2,<2.1.0 (circular dependency aborted here)
    │   │   ├── murmurhash >=1.0.2,<1.1.0 (circular dependency aborted here)
    │   │   ├── numpy >=1.15.0 (circular dependency aborted here)
    │   │   ├── packaging >=20.0 (circular dependency aborted here)
    │   │   ├── preshed >=3.0.2,<3.1.0 (circular dependency aborted here)
    │   │   ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<1.11.0 (circular dependency aborted here)
    │   │   ├── setuptools * (circular dependency aborted here)
    │   │   ├── srsly >=2.4.0,<3.0.0 (circular dependency aborted here)
    │   │   └── wasabi >=0.8.1,<1.2.0 
    │   ├── tqdm >=4.38.0,<5.0.0 
    │   │   └── colorama * (circular dependency aborted here)
    │   ├── typer >=0.3.0,<0.8.0 (circular dependency aborted here)
    │   └── wasabi >=0.9.1,<1.1.0 (circular dependency aborted here)
    ├── spacy-alignments >=0.7.2,<1.0.0 
    ├── srsly >=2.4.0,<3.0.0 (circular dependency aborted here)
    ├── torch >=1.6.0 
    │   ├── filelock * 
    │   ├── jinja2 * (circular dependency aborted here)
    │   ├── networkx * 
    │   ├── nvidia-cublas-cu11 
    │   │   ├── setuptools * (circular dependency aborted here)
    │   │   └── wheel * 
    │   ├── nvidia-cuda-cupti-cu11 11.7.101 
    │   │   ├── setuptools * (circular dependency aborted here)
    │   │   └── wheel * (circular dependency aborted here)
    │   ├── nvidia-cuda-nvrtc-cu11 11.7.99 
    │   │   ├── setuptools * (circular dependency aborted here)
    │   │   └── wheel * (circular dependency aborted here)
    │   ├── nvidia-cuda-runtime-cu11 11.7.99 
    │   │   ├── setuptools * (circular dependency aborted here)
    │   │   └── wheel * (circular dependency aborted here)
    │   ├── nvidia-cudnn-cu11 
    │   │   ├── setuptools * (circular dependency aborted here)
    │   │   └── wheel * (circular dependency aborted here)
    │   ├── nvidia-cufft-cu11 
    │   ├── nvidia-curand-cu11 
    │   │   ├── setuptools * (circular dependency aborted here)
    │   │   └── wheel * (circular dependency aborted here)
    │   ├── nvidia-cusolver-cu11 
    │   │   ├── setuptools * (circular dependency aborted here)
    │   │   └── wheel * (circular dependency aborted here)
    │   ├── nvidia-cusparse-cu11 
    │   │   ├── setuptools * (circular dependency aborted here)
    │   │   └── wheel * (circular dependency aborted here)
    │   ├── nvidia-nccl-cu11 2.14.3 
    │   ├── nvidia-nvtx-cu11 11.7.91 
    │   │   ├── setuptools * (circular dependency aborted here)
    │   │   └── wheel * (circular dependency aborted here)
    │   ├── sympy * 
    │   │   └── mpmath >=0.19 
    │   ├── triton 2.0.0 
    │   │   ├── cmake * 
    │   │   ├── filelock * (circular dependency aborted here)
    │   │   ├── lit * 
    │   │   └── torch * (circular dependency aborted here)
    │   └── typing-extensions * (circular dependency aborted here)
    └── transformers >=3.4.0,<4.26.0 
        ├── filelock * (circular dependency aborted here)
        ├── huggingface-hub >=0.10.0,<1.0 
        │   ├── filelock * (circular dependency aborted here)
        │   ├── packaging >=20.9 (circular dependency aborted here)
        │   ├── pyyaml >=5.1 
        │   ├── requests * (circular dependency aborted here)
        │   ├── tqdm >=4.42.1 (circular dependency aborted here)
        │   └── typing-extensions >= (circular dependency aborted here)
        ├── numpy >=1.17 (circular dependency aborted here)
        ├── packaging >=20.0 (circular dependency aborted here)
        ├── pyyaml >=5.1 (circular dependency aborted here)
        ├── regex !=2019.12.17 
        ├── requests * (circular dependency aborted here)
        ├── tokenizers >=0.11.1,<0.11.3 || >0.11.3,<0.14 
        └── tqdm >=4.27 (circular dependency aborted here)
flake8 6.0.0 the modular source code checker: pep8 pyflakes and co
├── mccabe >=0.7.0,<0.8.0
├── pycodestyle >=2.10.0,<2.11.0
└── pyflakes >=3.0.0,<3.1.0
flask 2.2.3 A simple framework for building complex web applications.
├── click >=8.0
│   └── colorama * 
├── importlib-metadata >=3.6.0
│   └── zipp >=0.5 
├── itsdangerous >=2.0
├── jinja2 >=3.0
│   └── markupsafe >=2.0 
└── werkzeug >=2.2.2
    └── markupsafe >=2.1.1 
inference 0.1
matplotlib 3.7.1 Python plotting package
├── contourpy >=1.0.1
│   └── numpy >=1.16 
├── cycler >=0.10
├── fonttools >=4.22.0
├── importlib-resources >=3.2.0
│   └── zipp >=3.1.0 
├── kiwisolver >=1.0.1
├── numpy >=1.20
├── packaging >=20.0
├── pillow >=6.2.0
├── pyparsing >=2.3.1
└── python-dateutil >=2.7
    └── six >=1.5 
numpy 1.24.2 Fundamental package for array computing in Python
openai 0.27.4 Python client library for the OpenAI API
├── aiohttp *
│   ├── aiosignal >=1.1.2 
│   │   └── frozenlist >=1.1.0 
│   ├── async-timeout >=4.0.0a3,<5.0 
│   ├── attrs >=17.3.0 
│   ├── charset-normalizer >=2.0,<4.0 
│   ├── frozenlist >=1.1.1 (circular dependency aborted here)
│   ├── multidict >=4.5,<7.0 
│   └── yarl >=1.0,<2.0 
│       ├── idna >=2.0 
│       └── multidict >=4.0 (circular dependency aborted here)
├── requests >=2.20
│   ├── certifi >=2017.4.17 
│   ├── charset-normalizer >=2,<4 
│   ├── idna >=2.5,<4 
│   └── urllib3 >=1.21.1,<1.27 
└── tqdm *
    └── colorama * 
openai-async 0.0.3 A light-weight, asynchronous client for OpenAI API - text completion, image generation and embeddings.
├── httpx *
│   ├── certifi * 
│   ├── httpcore >=0.15.0,<0.18.0 
│   │   ├── anyio >=3.0,<5.0 
│   │   │   ├── idna >=2.8 
│   │   │   └── sniffio >=1.1 
│   │   ├── certifi * (circular dependency aborted here)
│   │   ├── h11 >=0.13,<0.15 
│   │   └── sniffio >=1.0.0,<2.0.0 (circular dependency aborted here)
│   ├── idna * (circular dependency aborted here)
│   └── sniffio * (circular dependency aborted here)
└── pytest *
    ├── colorama * 
    ├── exceptiongroup >=1.0.0rc8 
    ├── iniconfig * 
    ├── packaging * 
    ├── pluggy >=0.12,<2.0 
    └── tomli >=1.0.0 
pandas 2.0.0 Powerful data structures for data analysis, time series, and statistics
├── numpy >=1.20.3
├── numpy >=1.21.0
├── numpy >=1.23.2
├── python-dateutil >=2.8.2
│   └── six >=1.5 
├── pytz >=2020.1
└── tzdata >=2022.1
pipdeptree 2.0.0 Command line utility to show dependency tree of packages
└── pip >=6.0.0
plotly 5.14.1 An open-source, interactive data visualization library for Python
├── packaging *
└── tenacity >=6.2.0
python-dotenv 1.0.0 Read key-value pairs from a .env file and set them as environment variables
quart 0.18.4 A Python ASGI web microframework with the same API as Flask
├── aiofiles *
├── blinker <1.6
├── click >=8.0.0
│   └── colorama * 
├── hypercorn >=0.11.2
│   ├── h11 * 
│   ├── h2 >=3.1.0 
│   │   ├── hpack >=4.0,<5 
│   │   └── hyperframe >=6.0,<7 
│   ├── priority * 
│   ├── toml * 
│   └── wsproto >=0.14.0 
│       └── h11 >=0.9.0,<1 (circular dependency aborted here)
├── importlib-metadata *
│   └── zipp >=0.5 
├── itsdangerous *
├── jinja2 *
│   └── markupsafe >=2.0 
├── markupsafe *
└── werkzeug >=2.2.0
    └── markupsafe >=2.1.1 
scikit-learn 1.2.2 A set of python modules for machine learning and data mining
├── joblib >=1.1.1
├── numpy >=1.17.3
├── scipy >=1.3.2
│   └── numpy >=1.18.5,<1.26.0 
└── threadpoolctl >=2.0.0
sklearn 0.0.post4 deprecated sklearn package, use scikit-learn instead
spacy-experimental 0.6.2 Cutting-edge experimental spaCy components and features
└── spacy >=3.3.0,<3.6.0
    ├── catalogue >=2.0.6,<2.1.0 
    ├── cymem >=2.0.2,<2.1.0 
    ├── jinja2 * 
    │   └── markupsafe >=2.0 
    ├── langcodes >=3.2.0,<4.0.0 
    ├── murmurhash >=0.28.0,<1.1.0 
    ├── numpy >=1.15.0 
    ├── packaging >=20.0 
    ├── pathy >=0.3.5 
    │   ├── smart-open >=5.2.1,<7.0.0 
    │   └── typer >=0.3.0,<1.0.0 
    │       └── click >=7.1.1,<9.0.0 
    │           └── colorama * 
    ├── preshed >=3.0.2,<3.1.0 
    │   ├── cymem >=2.0.2,<2.1.0 (circular dependency aborted here)
    │   └── murmurhash >=0.28.0,<1.1.0 (circular dependency aborted here)
    ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<1.11.0 
    │   └── typing-extensions >=4.2.0 
    ├── requests >=2.13.0,<3.0.0 
    │   ├── certifi >=2017.4.17 
    │   ├── charset-normalizer >=2,<4 
    │   ├── idna >=2.5,<4 
    │   └── urllib3 >=1.21.1,<1.27 
    ├── setuptools * 
    ├── smart-open >=5.2.1,<7.0.0 (circular dependency aborted here)
    ├── spacy-legacy >=3.0.10,<3.1.0 
    ├── spacy-loggers >=1.0.0,<2.0.0 
    ├── srsly >=2.4.3,<3.0.0 
    │   └── catalogue >=2.0.3,<2.1.0 (circular dependency aborted here)
    ├── thinc >=8.1.0,<8.2.0 
    │   ├── blis >=0.7.8,<0.8.0 
    │   │   └── numpy >=1.15.0 (circular dependency aborted here)
    │   ├── catalogue >=2.0.4,<2.1.0 (circular dependency aborted here)
    │   ├── confection >=0.0.1,<1.0.0 
    │   │   ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<1.11.0 (circular dependency aborted here)
    │   │   └── srsly >=2.4.0,<3.0.0 (circular dependency aborted here)
    │   ├── cymem >=2.0.2,<2.1.0 (circular dependency aborted here)
    │   ├── murmurhash >=1.0.2,<1.1.0 (circular dependency aborted here)
    │   ├── numpy >=1.15.0 (circular dependency aborted here)
    │   ├── packaging >=20.0 (circular dependency aborted here)
    │   ├── preshed >=3.0.2,<3.1.0 (circular dependency aborted here)
    │   ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<1.11.0 (circular dependency aborted here)
    │   ├── setuptools * (circular dependency aborted here)
    │   ├── srsly >=2.4.0,<3.0.0 (circular dependency aborted here)
    │   └── wasabi >=0.8.1,<1.2.0 
    ├── tqdm >=4.38.0,<5.0.0 
    │   └── colorama * (circular dependency aborted here)
    ├── typer >=0.3.0,<0.8.0 (circular dependency aborted here)
    └── wasabi >=0.9.1,<1.1.0 (circular dependency aborted here)
tiktoken 0.3.3 tiktoken is a fast BPE tokeniser for use with OpenAI's models
├── regex >=2022.1.18
└── requests >=2.26.0
    ├── certifi >=2017.4.17 
    ├── charset-normalizer >=2,<4 
    ├── idna >=2.5,<4 
    └── urllib3 >=1.21.1,<1.27 



Organization Name:



[Social Welfare]

Tech Skills Needed:

[Docker, Python, Poetry]







Sub Category

[Docker, Dependency Management]

c4gt-community-support[bot] commented 1 year ago

Hi! Mandatory Details - The following details essential to submit tickets to C4GT Community Program are missing. Please add them!

Please update the ticket

agaraman0 commented 1 year ago

I tried first hand on this and tried to build an image by using anibali/docker-pytorch but then the new image size went up to 16 GB and whereas current docker image size is 3.04 GB

Screenshot 2023-07-07 at 12 46 53 AM

on checking the current docker image size dependencies I found this

Screenshot 2023-07-07 at 12 47 24 AM

which does not match with mentioned site packages in #44

agaraman0 commented 1 year ago

@GautamR-Samagra @ChakshuGautam would like to work on this issue, LMK if I am missing something

Deekshithrathod commented 1 year ago

@GautamR-Samagra @ChakshuGautam I understand the basics of docker & packaged a couple of my projects. Am I correct in assuming that the idea here is to do multi-stage or distro-less builds to reduce the image size? I mean, basically, remove all the unnecessary dependencies that come bundled with base OS like Ubuntu etc.?