dylan-profiler / visions

Type System for Data Analysis in Python
https://dylan-profiler.github.io/visions/visions/getting_started/usage/types.html
Other
203 stars 19 forks source link

How to check if a type is/is_not parent of another type ? #160

Open ttpro1995 opened 3 years ago

ttpro1995 commented 3 years ago

Follow the example of "Problem type inference".

graph

From one dataframe, I already make a list of type for each column. Here is the type_list:

[Discrete,
 Nominal,
 Discrete,
 Nominal,
 Nominal,
 Nominal,
 Nominal,
 Nominal,
 Nominal,
 Binary,
 Discrete,
 Discrete,
 Discrete,
 Nominal,
 Binary]

type(type_list[0]) give visions.types.type.VisionsBaseTypeMeta

Now, I want to check if each type either have parent type of Categorical or Numeric.

for column, t in zip(column, type_list):
     if is_type_parent_of_categorical(t): 
            category_job(dataframe[column]) 
# binary is child if Categorical
is_type_parent_of_categorical(type_list[14]) -> True 

# Discrete is child of Numeric 
is_type_parent_of_categorical(type_list[0]) -> False 

How should I implement is_type_parent_of_categorical ?

My workaround seem to work because string comparision:

def is_type_parent_of_categorical(visions_type):
        type_str = str(visions_type)
            if type_str in ["Categorical", "Ordinal", "Nominal", "Binary"]:
                return True
            return False
ieaves commented 3 years ago

Hey @ttpro1995 - there's a short and a long answer to your question.

Short Answer: Type relations are not defined on the types inheritance hierarchy (all types inherit from VisionsBaseType), rather they can be accessed from the .relations property. You'll notice I'm using the term relations rather than children which leads to...

Long Answer: Only nodes in a typeset have actual children. The relations attribute on a type will return a list of potential parents to the Type. Encoding parent relations on types rather than child relations allows us to compose types together to form typesets (easiest way to see this -> the root of a typeset graph is Generic, if Generic tracked its children then creating a new type like PositiveInteger would counterintuitively require source code changes to Generic; it would effectively produce strong coupling between types).

So, children only really exist on a TypeSet but it's pretty easy to get these as well. I'm going to use the StandardTypeset as an example but the same will work for any typeset you create / use.

Under the hood visions uses networkx to build typeset graphs. Each typeset has two graph attributes:

  1. A base_graph which includes non-inferential relations (i.e. excludes Int -> Float because that would require a coercion to the test sequence).
  2. A relation_graph which includes all possible types and relations.

So in order to get all possible children of a node in a Typeset we just have to use the networkx API and the Typesets relation_graph.

typeset = StandardTypeset()
test_type = Categorical

child_types = typeset.relation_graph[test_type]  

Technically child_types is going to be a networkx AtlasView object but it supports the in operation so it will work just fine for your purposes. So your is_child function would look something like

def is_child(typeset, A, B)
    """Determines if B is a child of A for a given typeset"""
    return B in typeset.relation_graph[A]

Technically this will only check a single level deep in the tree (i.e. the children), judging from your example you're actually interested in evaluating all possible descendants of a node which can be similarly achieved by

import networkx as nx

def is_descendant(typeset, A, B)
    """Determines if B is a descendant of A for a given typeset"""
    return B in nx.descendants(typeset.relation_graph, A)

EDIT:

It occurred to me you may simply be interested in determining whether your data is Numeric or Categorical - there's an even easier way to do this than checking the parent relations which is just to create a new typeset i.e.

new_typeset = Generic + Numeric + Category

new_typeset.infer_type(df)
ieaves commented 3 years ago

If you're interested in making a PR to include some of this functionality by default we would be more than happy to help you get those through! In the meantime, I've marked this as an enhancement request.