altair-viz / altair-transform

Evaluation of Vega-Lite transforms in Python
MIT License
70 stars 8 forks source link

error with transform_fold #4

Closed williehallock802 closed 5 years ago

williehallock802 commented 5 years ago
import pandas as pd
import numpy as np
import altair as alt

data = { 'ColA': {('A', 'A-1'): 'w',
                 ('A', 'A-2'): 'w',
                 ('A', 'A-3'): 'w',
                 ('B', 'B-1'): 'q',
                 ('B', 'B-2'): 'q',
                 ('B', 'B-3'): 'r',
                 ('C', 'C-1'): 'w',
                 ('C', 'C-2'): 'q',
                 ('C', 'C-3'): 'q',
                 ('C', 'C-4'): 'r'},
        'ColB': {('A', 'A-1'): 'r',
                 ('A', 'A-2'): 'w',
                 ('A', 'A-3'): 'w',
                 ('B', 'B-1'): 'q',
                 ('B', 'B-2'): 'q',
                 ('B', 'B-3'): 'e',
                 ('C', 'C-1'): 'e',
                 ('C', 'C-2'): 'q',
                 ('C', 'C-3'): 'r',
                 ('C', 'C-4'): 'w'} 
        }

df = pd.DataFrame(data).reset_index( drop = True )

mychart = alt.Chart(df).transform_fold(
    [r'ColA', 'ColB'], as_=['column', 'value'] 
).mark_bar().encode(
    x=alt.X('value:N', sort=['r', 'q', 'e', 'w']),
    y=alt.Y('count():Q', scale=alt.Scale(domain=[0, len(df.index)])),
    column='column:N'
)

from altair_transform import extract_data
data = extract_data(mychart)
data.head()

generates the error:

altair-transform/altair_transform/core/fold.py in visit_fold(transform, df)
      9     transform = transform.to_dict()
     10     fold = transform["fold"]
---> 11     var_name, value_name = transform._get("as", ("key", "value"))
     12     value_vars = [c for c in df.columns if c in fold]
     13     id_vars = [c for c in df.columns if c not in fold]

AttributeError: 'dict' object has no attribute '_get'
jakevdp commented 5 years ago

Thanks, I'll try to take a look.

maliky commented 5 years ago

Same issue here. Removing that underscore before the get solves the issue... and outputs

  column value
0   ColA     w
1   ColA     w
2   ColA     w
3   ColA     q
4   ColA     q

But maybe what's missing is an instance test like in data.py:35

        if isinstance(context, dict):
            datasets = context.get('datasets', {})
        else:
            datasets = context._get('datasets', {})
jakevdp commented 5 years ago

That would work. This is an instance of general confusion throughout the codebase about whether inputs are dicts or schema objects. I went through a while ago and tried to address most of it, but this is one of the instances I missed (there may be others).

I think rather than an isinstance check each time we need to get an attribute, it would be better to normalize inputs so that we know what they are, and know what methods can be used on them.

Are you interested in working on this?

jakevdp commented 5 years ago

So looking at this, we've already pre-converted the input to a dict, so just using transform.get() directly should be sufficient. The reason this was not caught is because there is no test of the fold transform (I'm certain I wrote the test when I wrote the code, but I'm not sure what happened to it).

maliky commented 5 years ago

I don't think there's much else to do here. :o)

jakevdp commented 5 years ago

TODO list:

maliky commented 5 years ago

Thank you for your offer. Unfortunately I didn't know enough about tests to be helpful, sorry.
Now, looking your solution, will help for next time.