Texera / texera

Collaborative Machine-Learning-Centric Data Analytics Using Workflows
https://texera.github.io
Apache License 2.0
160 stars 68 forks source link

Enhance Histogram Operator with Distribution Plot Visualization #2711

Closed JeshChoi closed 1 month ago

JeshChoi commented 2 months ago

Purpose

Integrated Distribution Plot Operator into Histogram Operator, supporting visualization of rug, violin, and box distributions. Visualizing data through different distribution plots is crucial for comprehensive data analysis and interpretation:

Description

Demo Pictures

Simple CSV to Operator Set Up

Screenshot 2024-07-03 at 21 17 53

New Parameter

Screenshot 2024-07-03 at 21 18 06

Violin Distribution Output

Screenshot 2024-07-03 at 21 19 27

Box Distribution Output

Screenshot 2024-07-03 at 21 19 01

Rug Distribution Output

Screenshot 2024-07-03 at 21 18 37

kunwp1 commented 2 months ago

Hey @JeshChoi, I just wanted to let you know that your PR looks excellent! I want to suggest that the plot seems similar to the existing histogram plot. Both use the Plotly API but with different parameters. Instead of creating the distribution plot as a new visualization operator, it would be nice to integrate your work into the histogram plot. I'm curious to hear your thoughts on this. Thank you.

JeshChoi commented 2 months ago

Hi @kunwp1, thanks for looking over my PR.

Regarding the similarities between the DistPlot and Histogram operators, the decision to integrate them should consider our approach to migrating visualizers from Plotly to Texera.

Professor Chen Li previously mentioned that he envisions Texera incorporating all visualizers from Plotly, possibly even categorizing them similarly. Therefore, replicating all separate visualizers from Plotly to Texera could ensure comprehensive and consistent replication.

While this may lead to some redundancy in operators, it's important to note that non-technical users might search specifically for a distribution plot instead of a histogram, necessitating the inclusion of a DistPlot operator.

I'm open to either option and can proceed in either direction.

kunwp1 commented 2 months ago

@JeshChoi I understand your point of view; however, I believe it would be beneficial to merge them.

The Plotly website categorizes plots into distribution and histogram plots, but I don't notice a visual distinction between them. From my perspective, the distribution plot is an extension of the histogram plot, allowing users to visualize multiple histograms and additional plots, such as the box plot. Separating them into distinct operators may make sense if you consider this a significant difference. However, I think that separating these two operators could potentially lead to confusion among users due to this similarity.

JeshChoi commented 2 months ago

@kunwp1 I have made the integration changes. Please take a look at the PR whenever you get a chance.

JeshChoi commented 1 month ago

@kunwp1 I have changed the title of the PR

JeshChoi commented 1 month ago

Awesome thanks!