AutoViML / deep_autoviml

Build tensorflow keras model pipelines in a single line of code. Now with mlflow tracking. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.
Apache License 2.0
120 stars 36 forks source link
autokeras automl data-science deep-learning gcp keras machine-learning mlflow mljar pycaret python tensorflow tensorflow2 tpot

deep_autoviml

Build keras pipelines and models in a single line of code!

banner forthebadge made-with-python ForTheBadge built-with-love standard-readme compliant Python Versions Build Status

Table of Contents

Update (May 2024): Now upgraded to tensorflow 2.12 - latest version of tensroflow!

You can now use the latest version of tensorflow 2.8 and above to test your deep learning models thanks to deep_autoviml. Enjoy the upgrade!

Update (Jan 2022): Now with mlflow!

You can now add mlflow experiment tracking to all your deep_autoviml runs. mlflow is a popular python library for experiment tracking and MLOps in general. See more details below under mlflow.

Motivation

✨ deep_autoviml is a powerful new deep learning library with a very simple design goal: ✨ Make it easy for novices and experts to experiment and build tensorflow.keras preprocessing pipelines and models in fewest steps. But just because we make it easy, does not mean you should trust everything that it does or treat it like a black box. You must still use your own judgement and intutition to make sure the results are accurate and explainable, not to mention that the model conforms to Responsbile AI principles.

Watch YouTube Video for Demo of Deep_AutoViML

YouTube Demo

What is Deep AutoViML?

Deep AutoViML is the next version of AutoViML, a popular automl library that was developed using pandas, scikit-learn and xgboost+catboost. Deep AutoViML takes the best features of AutoViML and uses the latest generation of tensorflow and keras libraries to build a fast model and data pipeline for MLOps use cases.

deep autoviml is primarily meant for sophisticated data engineers, data scientists and ML engineers to quickly prototype and build tensorflow 2.4.1+ models and pipelines for any data set, any size using a single line of code. It can build models for structured data, NLP and image datasets. It can also handle time series data sets in the future.

  1. You can either choose deep_autoviml to automatically buid a custom Tensorflow model
  2. Instead, you can "bring your own model" ("BYOM" option) model to attach keras data pipelines to your model.
  3. Additionally, you can choose any Tensorflow Hub model (TFHub) to custom train on your data. Just look for instructions below in "Tips for using deep_autoviml" section.
  4. There are 4 ways to build your model quickly or slowly depending on your needs:
    • fast: a quick model that uses only dense layers (deep layers)
    • fast1: a deep and wide model that uses both deep and wide layers. This is slightly slower than fast model.
    • fast2: a deep and cross model that crosses some variables (hence deep and cross). This is about the same speed as 'fast1` model.
    • auto: This uses Optuna or Storm-Tuner to perform combinations of dense layers and select best architecture. This will take the longest time.

why_deep

Features

These are the main features that distinguish deep_autoviml from other libraries:

Technology

deep_autoviml uses the latest in tensorflow (2.4.1+) td.data.Datasets and tf.keras preprocessing technologies: the Keras preprocessing layers enable you to encapsulate feature engineering and preprocessing into the model itself. This makes the process for training and predictions the same: just feed input data (in the form of files or dataframes) and the model will take care of all preprocessing before predictions.

To perform its preprocessing on the model itself, deep_autoviml uses tensorflow (TF 2.4.1+ and later versions) and tf.keras experimental preprocessing layers: these layers are part of your saved model. They become part of the model's computational graph that can be optimized and executed on any device including GPU's and TPU's. By packaging everything as a single unit, we save the effort in reimplementing the preprocessing logic on the production server. The new model can take raw tabular data with numeric and categorical variables or strings text directly without any preprocessing. This avoids missing or incorrect configuration for the preprocesing_layer during production.

In addition, to select the best hyper parameters for the model, it uses a new open source library:

Install

deep_autoviml requires tensorflow v2.4.1+ and storm-tuner to run. Don't worry! We will install these libraries when you install deep_autoviml.

pip install deep_autoviml

For your own conda environment...

conda create -n <your_env_name> python=3.7 anaconda
conda activate <your_env_name> # ON WINDOWS: `source activate <your_env_name>`
pip install deep_autoviml
or
pip install git+https://github.com/AutoViML/deep_autoviml.git

Usage

deep_usage deep_autoviml can be invoked with a simple import and run statement:

from deep_autoviml import deep_autoviml as deepauto

Load a data set (any .csv or .gzip or .gz or .txt file) into deep_autoviml and it will split it into Train and Validation datasets inside. You only need to provide a target variable, a project_name to store files in your local machine and leave the rest to defaults:

model, cat_vocab_dict = deepauto.fit(train, target, keras_model_type="auto",
            project_name="deep_autoviml", keras_options={}, model_options={}, 
            save_model_flag=True, use_my_model='', model_use_case='', verbose=0,
            use_mlflow=False, mlflow_exp_name='autoviml', mlflow_run_name='first_run')

Once deep_autoviml writes your saved model and cat_vocab_dict files to disk in the project_name directory, you can load it from anywhere (including cloud) for predictions like this using the model and cat_vocab_dict generated above:

There are two kinds of predictions: This is the usual (typical) format.

predictions = deepauto.predict(model, project_name, test_dataset=test,
            keras_model_type=keras_model_type, cat_vocab_dict=cat_vocab_dict)

In case you are performing image classification, then you need to use deepauto.predict_images() for making predictions. See the Image section below for more details.

API

Arguments

deep_autoviml requires only a single line of code to get started. You can however, fine tune the model we build using multiple options using dictionaries named "model_options" and "keras_options". These two dictionaries act like python **kwargs to enable you to fine tune hyperparameters for building our tf.keras model. Instructions on how to use them are provided below.

how_deep

Image

image_deep Leaf Images referred to here are from Kaggle and are copyright of Kaggle. They are shown for illustrative purposes. Kaggle Leaf Image Classification

deep_autoviml can do image classification. All you need to do is to organize your image_dir folder under train, validation and test sub folders. Train folder for example, can contain images for each label as a sub-folder. All you need to provide is the name of the image directory for example "leaf_classification" and deep_autoviml will automatically read the images and assign them correct labels and the correct dataset (train, test, etc.)

image_dir = "leaf_classification" You also need to provide the height and width of each image as well as the number of channels for each image.

img_height = 224
img_width = 224
img_channels = 3

You then need to set the keras model type argument as "image".

keras_model_type = "image"

You also need to send in the above arguments as model options as follows: model_options = {'image_directory': image_dir, 'image_height': img_height, 'image_width':img_width, 'image_channels':img_channels }

You can then call deep_autoviml for training the model as usual with these inputs: model, dicti = deepauto.fit(trainfile, target, keras_model_type=keras_model_type, project_name='leaf_classification', save_model_flag=False, model_options=model_options, keras_options=keras_options, use_my_model='', verbose=0)

To make predictions, you need to provide the dictionary ("dicti") from above and the trained model. You also need to provide where the test images are stored as follows. test_image_dir = 'leaf_classification/test' predictions = deepauto.predict_images(test_image_dir, model, dicti)

NLP

NLP_deep deep_autoviml can also do NLP text classification. There are two ways to do NLP:

  • 1. Using folders and sub-folders
  • All you need to do is to organize your text_dir folder under train, validation and test sub folders. Train folder for example, can contain Text files for each label as a sub-folder. All you have to do is:

    keras_model_type as "BERT" or keras_model_type as "USE" or and it will use either BERT or Universal Sentence Encoder to preprocess and transform your text into embeddings to feed to a model.

  • 2. Using CSV file
  • Just provide a CSV file with column names and text. If you have multiple text columns, it will handle all of them automatically. If you want to mix numeric and text columns, you can do so in the same CSV file. deep_autoviml will automatically detect which columns are text (NLP) and which columns are numeric and do preprocessing automatically. You can specify whether to use:

    keras_model_type as "BERT" or keras_model_type as "USE" or and it will use either BERT or Universal Sentence Encoder as specified on your text columns. If you want to use neither of them, you can just specify:

    keras_model_type as "auto" and deep_autoviml will automatically choose the best embedding for your model.

    Tips

    You can use the following arguments in your input to make deep_autoviml work best for you:

    Maintainers

    Contributing

    See the contributing file!

    PRs accepted.

    License

    Apache License 2.0 © 2020 Ram Seshadri

    DISCLAIMER

    This project is not an official Google project. It is not supported by Google and Google specifically disclaims all warranties as to its quality, merchantability, or fitness for a particular purpose.