Add continuous integration

saucisson commented 6 years ago

pycodingstyle on python code;
autopep8 if needed;
pylint on python code;
tests.

saucisson commented 6 years ago

Add as a git hook for all of them, and also in the travis build.

saucisson commented 6 years ago

@mencattini i have prepared everything for a Travis build (https://travis-ci.org/cui-unige/mcc4mcc). The build does not work because of a FIXME and a linter warning. Can you fix them? Branch is issue-11.

petitgrizzlies commented 6 years ago

I come back from vacations tomorow. I begin the fix on friday.

saucisson commented 6 years ago

Or it can also want monday ;-)

petitgrizzlies commented 6 years ago

The build failure come from mcc.py on the line 192 with :

        # FIXME: i am not sure the result is correct, because there is no check
        # that the fields of the characteristic have the same name as the
        # fields that were used during learning.

Should i delete it ? I'm not sure to be able to fix it, at leat we need to talk about.

saucisson commented 6 years ago

Do not delete it, we will discuss it monday.

petitgrizzlies commented 6 years ago

The training part just uses the values. It means the model doesn't know the categories, it uses the arrays. It will be our job to be sure that the next array will be in the same form as previous. If we preserve the ordre, there wouldn't have any ambiguity.

saucisson commented 6 years ago

But we do not set an order. Instead, we name fields...

petitgrizzlies commented 6 years ago

Scikit doesn't use Pandas object. Every function, like fit or score take as parameters :

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

    Training vectors, where n_samples is the number of samples
    and n_features is the number of features.

The definition of an array-like from Numpy :

In general, numerical data arranged in an array-like structure in Python can be converted to arrays through the use of the array() function. The most obvious examples are lists and tuples. See the documentation for array() for details for its use. Some objects may support the array-protocol and allow conversion to arrays this way. A simple way to find out if the object can be converted to a numpy array using array() is simply to try it interactively and see if it works! (The Python Way).

At this point, I wasn't sure if Scikit use the headers or not. By reading the Scikit code, i descover that in every function with a array-likeparamter, they cast the vector by applying the check_array. The Scikit documentation says :

Input validation on an array, list, sparse matrix or similar. By default, the input is converted to an at least 2D numpy array. If the dtype of the array is object, attempt converting to float, raising on failure.

In the model, Scikit doesn't use the header, only some numpy arrays. It means order matters.

saucisson commented 6 years ago

Then, it means that we should convert all dictionaries passed to the learning algorithms into arrays, by setting ourself the order of the fields (and pass it also to the mcc.py script (through the learned.json file).

It is strange that the mcc.py tool always obtains a tool identifier (> 2) when doing model.predict (pandas.DataFrame ([test])).

petitgrizzlies commented 6 years ago

It's a kind of preprocessing before using the mcctool right ?

saucisson commented 6 years ago

No, it can be computed during extract.py, saved in learned.json and loaded at the beginning of mcc.py (in fact when loading learned.json).

petitgrizzlies commented 6 years ago

Proof of DataFrame ordering.

import pandas as pd
import numpy as np

keys = np.array([ele for ele in 'abcdefghijklmnopqrstuvwxyz'])

np.random.shuffle(keys)
print(f"Are keys different for alphabet : {np.any(keys != np.random.shuffle(keys))}")

d = {}

# random key insertion
for key in keys:
    d[key] = key

df = pd.DataFrame([d])

print(f"Are values order same from keys to dataframe : {np.all(np.array(np.array([ele for ele in 'abcdefghijklmnopqrstuvwxyz'])) == df.as_matrix())}")

cui-unige / mcc4mcc

Add continuous integration #16