georgian-io-archive / foreshadow

An automatic machine learning system
https://foreshadow.readthedocs.io
Apache License 2.0
29 stars 2 forks source link

Adding new auto intent resolving model from the research project #208

Closed jzhang-gp closed 4 years ago

jzhang-gp commented 4 years ago

Description

Adding the latest intent resolving code for additional intent type.

One weird finding on the following test. Both columns are columns with distinct values. First column is from 0 to 99 while the second is from 1 to 100. The first column is treated as Numeric but the second is treated as Droppable. Is this because the ID data usually starts from 1? @christeefy

def test_autointentmapping(step):
    """Test intents automatically mapped for a PreparerStep subclass."""
    import pandas as pd
    import numpy as np

    df = pd.DataFrame(
        [np.arange(i, i + 2) for i in range(100)], columns=["1", "2"]
    )
    step.get_mapping(df)
    assert step.cache_manager["intent", "1"] == "Numeric"
    # TODO this part DOES NOT make sense. Shouldn't both intents be Droppable?
    #  For now I'm temporarily change it to one Numeric and one Droppable to
    #  pass the test. We need to revisit this on the auto intent resolving
    #  side.
    assert step.cache_manager["intent", "2"] == "Droppable"
christeefy commented 4 years ago

Thanks for surfacing this bug, Jing.

I expanded on the test you had and it only fails [1 ... 100] and nothing else. From manual inspection, [1 ... 100] does not seem to appear any more frequently than [0 ... N] in the training data.

Interestingly, I was able to control the resolver's intent prediction simply by changing the "min" metafeature from 1 (Droppable) to not 1 (Numerical). Given this observation, I will include additional training data and retrain the model to tackle this specific issue.

jzhang-gp commented 4 years ago

Ping :)

jichaoz commented 4 years ago

Thanks for surfacing this bug, Jing.

I expanded on the test you had and it only fails [1 ... 100] and nothing else. From manual inspection, [1 ... 100] does not seem to appear any more frequently than [0 ... N] in the training data.

Interestingly, I was able to control the resolver's intent prediction simply by changing the "min" metafeature from 1 (Droppable) to not 1 (Numerical). Given this observation, I will include additional training data and retrain the model to tackle this specific issue.

Is this issue solved?

christeefy commented 4 years ago

Not yet, Ji Chao. I am free to work on it today, along with the dependency upgrade for pandas and scikit-learn

christeefy commented 4 years ago

[Update] Submitted a PR at the automl_research repo to address this issue. We can update this PR once that automl_research PR is approved. :)

jzhang-gp commented 4 years ago

[Update] Submitted a PR at the automl_research repo to address this issue. We can update this PR once that automl_research PR is approved. :)

PR approved with a minor comment :)