DeepBlueAI / AutoSmart

GNU General Public License v3.0
249 stars 94 forks source link

Alt text
license

The introduction of AutoSmart

The 1st place solution for KDD Cup 2019 AutoML Track

How to install

Requirements: Cython with C compiler.

clone or download autosmart package, run

python setup.py install 

How to use

import auto_smart

info = auto_smart.read_info("data")
train_data,train_label = auto_smart.read_train("data",info)
test_data = auto_smart.read_test("data",info)
auto_smart.train_and_predict(train_data,train_label,info,test_data)

Data Sample

Data

This page describes the datasets that our system can deal with.

Components

Each dataset is split into two subsets, namely the training set and the testing set.

Both sets have:

Table files

Each table file is a CSV file that stores a table (main or related), with '\t' as the delimiter. The first row indicates the names of features, a.k.a 'schema', and the following rows are the records.

The type of each feature can be found in the info dictionary that will be introduced soon.

There are 4 types of features, indicated by "cat", "num", "multi-cat", and "time", respectively:

Label file

The label file is associated only with the main table in the training set. It is a CSV file that contains only one column, with the first row as the header and the remaining indicating labels associated with instances in the main table.

info dictionary

Important information about each dataset is stored in a python dictionary structure named as info, which acts as an input of this system. Generally,you need to manually generate the dictionary information info.json file. Here we give details about info.

Alt text

Descriptions of the keys in info:

Relations Between Tables

Four table relations are considered in this system:

Contact Us

DeepBlueAI: 1229991666@qq.com