JetBrains / lets-plot

Multiplatform plotting library based on the Grammar of Graphics
https://lets-plot.org
MIT License
1.57k stars 51 forks source link

Facets: "free scales" options are ignored by discrete axis. #955

Closed alshan closed 9 months ago

alshan commented 11 months ago

ggplot in R removes unused categorical variables when scales="free_x" is used:

library(ggplot2)

plot_data <- data.frame(
   animal_type = c("pet", "pet", "pet", "pet", "farm_animal", "farm_animal", "farm_animal"),
   animal = c("cat", "dog", "rabbit", "hamster", "cow", "pig", "horse"),
   weight = c(5, 10, 2, 1, 500, 100, 700)
)

ggplot(plot_data, aes(x=animal, y=weight)) +
   facet_grid(~animal_type, scales="free_x") +
   geom_bar(stat="identity") +
   theme_bw() +
   theme(
      panel.grid.minor=element_blank()
   )

image

Lets-plot however uses the same discrete scale for all facets regardless of "free_x" setting:

import pandas as pd
from lets_plot import *

plot_data = pd.DataFrame.from_records([
    ("pet", "cat", 5),
    ("pet", "dog", 10),
    ("pet", "rabbit", 2),
    ("pet", "hamster", 1),

    ("farm_animal", "cow", 500),
    ("farm_animal", "pig", 100),
    ("farm_animal", "horse", 700),
])
plot_data.columns = ("animal_type", "animal", "weight")

plot = (
    ggplot(plot_data, aes(x="animal", y="weight"))
    + facet_grid("animal_type", scales="free_x")
    + geom_bar(stat="identity", size=0.5, color="black")
    + theme_bw()
    + theme(
        panel_grid_minor=element_blank()
    )
)
ggsave(plot, "/Users/lnguyen/Desktop/plot.png")

image

egayer commented 4 months ago

Dear @alshan ,

I comment on this thread because I am experiencing the same problem using facet_grid() (or facet_wrap()) and geom_histogram() on lets-plot-4.3.3

First I thought it was because of the problem described here, but it turns out it is not.

I then thought I was running into the same problem I opened a few weeks ago (https://github.com/JetBrains/lets-plot/issues/1117) because my dataframe contains values of different orders of magnitude between the variables I want to use for the facets (0-10, then 100-1000, etc.), but that's still not the problem.

facet_grid() and facet_wrap() ignore "free scales" options when using geom_histogram(), even with comparable values between variables.

Thanks

alshan commented 4 months ago

Hi @egayer ! This happens because while the 'bin' statistic is computed on each group separately, it uses the overall range of the data to determine bin positions and dimensions. As a result, each group consists of an equal number of bins (including bins containing 0 observations) and covers an equal range along the x-axis. In this situation, 'free scale' in the facet has no effect. But we can probably fix it by removing empty bins from the data.

egayer commented 4 months ago

Got it ! it makes sense, I can see how it simplifies the overall function. And of course, it would be useful if you guys find a simple way to remove the empty bins to fix it (I try to be careful because I don't want you to feel like I am "asking" you to do it; I am just giving you the point of you of an user who finds your work and lets-plot awesome). Faceting is incredibly powerful. I am not sure it is as used as it could be in the python ecosystem. Matplotlib doesn't have such built-in function (or I am not aware of) and seaborn does not really advertise on it although it does facet grid really efficiently.

So, as a work around here, I would use the Matplotlib way which would be to plot through a loop. This actually allows me to describe something I wanted to discuss with you for a while. In short, as you may now Matplotlib has this subplot() function (https://matplotlib.org/stable/gallery/subplots_axes_and_figures/subplots_demo.html) that allows to make a grid of plot through a loop:

Lets say we have 7 groups of value and we want to plot 7 histograms in a grid of 4 rows and 2 columns, then we could write something like:

for i in range(7):
    plt.subplot(4,2,i+1) # 4 is the number of rows of the grid, 2 is the number of columns and i+1 is the position : 1, 2, 3...
    plt.hist(group_of_value[i])
plt.show() 

Now, faceting in lets-plot (or ggplot2) is basically here to avoid looping right :) but I was wondering if such a loop could be done using ggbunch() ? what is your take on that ? I didn't use ggbunch() much yet, since I am not super confortable with width and height parameters that I find difficult to set, but basically would it be possible tu lets-plot using loop and ggbunch() ? That would be a work around here I think, but it would also allow grids of geom_imshow() for example.

alshan commented 4 months ago

Absolutely, see new #1122.

As a workaround, try using gggrid(). All you'll need to do is to compose an array of plots (in a loop or otherwise).

egayer commented 4 months ago

Right, gggrid() ! it works perfectly thanks !

By the way, I was thinking opening an issue but as it may not be due to Lets-plot I am not sure it would be relevant. The bad news is that Polars is not compatible anymore with Let-plot ! I don't know why but the new version seems to treat columns of a Dataframe as an object that does not behave like a list or like a Pandas' column object. It is bad news because Polars is so fast compared to Pandas that it was really cool to Lets-plot from Polars Dataframes. The workaround is simply to do something like ggplot(polars_df.to_pandas(), aes(x='age', y='size') and it works but it is laborious. For the time being, you may need to remove the compatibility with Polars from the documentation. I feel bad because I was the one suggesting to let people know that Polars was compatible with Lets-plot a year ago ! :)

alshan commented 4 months ago

Uh-oh, could you describe the lets-plot-Polaris situation in a separate issue? We want to continue Polaris support in lets-plot.

egayer commented 4 months ago

wow my bad ! while writing an example to put in the new issue, the problem disappeared ! I don't really know why I ran into problems yesterday, but the good news is that we can still plot Polars Dataframes with lets-plot ! sorry for the false alert ! I suspect an problem while reloading pip and Conda packages on the fly...