altair-viz / altair-viz.github.io

Generated documentation Website for Altair; source can be found at http://github.com/altair-viz/altair/
7 stars 16 forks source link

Layered Histogram example does not take into account null values #25

Closed harabat closed 3 years ago

harabat commented 3 years ago

Issue

This is based on the issue #2411 that I originally posted in the altair repo.

The method for making layered histograms suggested on altair-viz.github.io seems to fail to take into account null values within the range of the data: bins that should be empty are represented as having 1 observation.


Examples

Altair:

import pandas as pd
import altair as alt
import numpy as np

np.random.seed(42)

# Generating Data
source = pd.DataFrame({
    'Trial A': np.random.normal(0, 0.8, 1000),
    'Trial B': np.random.normal(-2, 1, 1000),
    'Trial C': np.random.normal(3, 2, 1000)
})

alt.Chart(source).transform_fold(
    ['Trial A', 'Trial B', 'Trial C'],
    as_=['Experiment', 'Measurement']
).mark_area(
    opacity=0.3,
    interpolate='step'
).encode(
    alt.X('Measurement:Q', bin=alt.Bin(maxbins=100)),
    alt.Y('count()', stack=None),
    alt.Color('Experiment:N')
)

layered_histogram_altair


Solution

The suggested method should be mark_bar instead of mark_area, which correctly represents the data. I'll be doing a PR shortly.

layered_histogram_altair_solution

The underlying issue with mark_area still needs fixing however: here's how it looks in Seaborn:

layered_histogram_seaborn

Should I open the issue in https://github.com/vega/vega-lite?

jakevdp commented 3 years ago

Closing in favor of https://github.com/altair-viz/altair/issues/2411