PAIR-code / facets

Visualizations for machine learning datasets
https://pair-code.github.io/facets/
Apache License 2.0
7.36k stars 887 forks source link

How to load and view your data? #15

Open tastyminerals opened 7 years ago

tastyminerals commented 7 years ago

After reading through all the interesting technicalities on what the facets is I decided to try it on one of my datasets and realized that I still don't know how to load and view the data? Is there a normal tutorial that basically says: if you have a dataset D, do a, b and c, voilà ?

venkatesh-1729 commented 7 years ago

Here you go for facets dive:

from IPython.core.display import display, HTML
import pandas as pd

from sklearn.datasets import load_boston
boston_data = load_boston()

df = pd.DataFrame(boston_data['data'], columns=boston_data['feature_names'])
jsonstr = df.to_json(orient='records')

HTML_TEMPLATE = \
    """
        <link rel="import" href="/nbextensions/facets-dist/facets-jupyter.html">
        <facets-dive id="elem_id" height="600"></facets-dive>
        <script>
          var data = {jsonstr};
          document.querySelector("#elem_id").data = data;
        </script>
    """
html = HTML_TEMPLATE.format(jsonstr=jsonstr)
display(HTML(html))
fils commented 7 years ago

Is there a similar example on how to download, build and view via a web browser? I was able to install bazel and rebuild for Jupyter, but not seeing a path to build and run in a browser. Any help appreciated, this looks useful to me and others I know. Thanks.

jameswex commented 7 years ago

Right now the best examples of how to embed the visualizations into a website are the demo pages (https://pair-code.github.io/facets/index.html and https://pair-code.github.io/facets/quickdraw.html). The code for those pages can be found in this project's gh-pages branch (https://github.com/PAIR-code/facets/tree/gh-pages)

The basic idea is that you can build facets.html with "bazel build facets:facets" from the top-level directory. Then that facets.html can be imported into another page (such as the index.html and quickdraw.html from the gh-pages branch) and the and polymer elements can then be used in that page.

In the future, we can create some documentation clarifying this.

tastyminerals commented 7 years ago

These are not examples but a showcase of facets visualization features. I understand examples as instructions on how to use the toolkit locally.

Right now there is too much work to get it up and running tbo and it is easier (for me at least) to shove my data into prewritten R script and plot it. Unless I have a very high need to analyze the dataset samples in the very detail.

anna-mo commented 7 years ago

@jameswex i was puzzled by that i had build facets.html,and index.html also imports it, but not only the examples of the index.html did not display but the function of upload csv files didn't work, i'm curious what should i do to make the index.html work smoothly on my computer locally.

rahulkhul commented 7 years ago

@jameswex , I have python implementation of facets, but sometime graph are not visible with the same, but if I try from https://pair-code.github.io/facets/, it works fine and load faster for any size of dataset. could you please let me know, how we can solve the problem.

jameswex commented 7 years ago

@rahulkhul Do you have an example dataset that it is failing with, along with an ipython notebook that shows the issue? Which facets visualization are you seeing this problem with?

Are there are errors in the browser debugging console?

rahulkhul commented 7 years ago

Hello James,

I have followed documentation provided on facets github repository, and build custom function which works on django server.

Main dataset preparation thing in python :

from facets_overview.generic_feature_statistics_generator import GenericFeatureStatisticsGenerator

import pandas as pd import base64

data = pd.read_csv(csv_file_path)

facets_dive_data = data.transpose().to_dict().values()

df = pd.DataFrame(facets_dive_data) proto = GenericFeatureStatisticsGenerator().ProtoFromDataFrames([{'name': 'test', 'table': df}]) protostr = base64.b64encode(proto.SerializeToString()).decode("utf-8") HTML_TEMPLATE = """{protostr}""" facets_overview = HTML_TEMPLATE.format(protostr=protostr)

HTML page:

I have imported facets_jupyter.html and importing the function into main index.html

Used two div for separate data visualisation for dive and overview as below:

And passing dataset values from javascript as below :

This is working for all dataset( data from csv is converted into required format with pandas.) but for few dataset, its not working. If i try to upload the same csv from https://pair-code.github.io/facets/ , its working fine. I am not able to debug the problem.

Hope you got my concern. Please let me know if my implementation is wrong and provide me some guidelines.

Thanks.

Rahul khul.

On Wed, Oct 11, 2017 at 11:37 PM, James Wexler notifications@github.com wrote:

@rahulkhul https://github.com/rahulkhul Do you have an example dataset that it is failing with, along with an ipython notebook that shows the issue? Which facets visualization are you seeing this problem with?

Are there are errors in the browser debugging console?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PAIR-code/facets/issues/15#issuecomment-335898182, or mute the thread https://github.com/notifications/unsubscribe-auth/AECP0Sva1jTfqZYfXFBbmVJKW69qcs-aks5srQPFgaJpZM4OcbBo .

jameswex commented 7 years ago

Would you be willing to share a csv on which this works and one on which it doesn't work? That would help me debug your issue. Let me know. Thanks!

rahulkhul commented 7 years ago

Yes sure.PFA.

Sometimes python implementation gives error for boolean values too. And the input csv will gives correct result if I upload it on https://pair-code.github.io/facets/. but it is not giving result (blank graph) if ran through python implementation. My implemented code seems to be good as it gives correct result same as https://pair-code.github.io/facets/, but sometimes blank result for some dataset.

Can we have same implementation as https://pair-code.github.io/facets/ in python ? Please let me know, If I am wrong.

Thanks. Rahul Khul.

On Thu, Oct 12, 2017 at 8:41 PM, James Wexler notifications@github.com wrote:

Would you be willing to share a csv on which this works and one on which it doesn't work? That would help me debug your issue. Let me know. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PAIR-code/facets/issues/15#issuecomment-336168067, or mute the thread https://github.com/notifications/unsubscribe-auth/AECP0eoE-batUHEWkW3if1UfFMqi8RCiks5sriwbgaJpZM4OcbBo .

jameswex commented 7 years ago

I don't see the attached files in the issue. Can you email the csvs to james.[my last name]@gmail.com? Thanks.

romulomadu-zz commented 6 years ago

What is the jupyter HTML template for this one? image

jameswex commented 6 years ago

@romulomadu What is the question exactly?

For Facets Overview comparing the two datasets in your screenshot, the code to do that is in the example ipynb notebook file in the facets_overview directory.

mg1075 commented 6 years ago

@jameswex > In the future, we can create some documentation clarifying this. That would be much appreciated. I am getting nowhere fast trying to install all the different dependencies, etc., just to set things up (e.g., bazel on windows)0, and then for how to run and view in a web browser, I seem to be hopping all over the place, searching for instructions and finding tidbits here and there.

stenpiren commented 6 years ago

same here @mg1075 !

richard5334 commented 6 years ago

Facets is great to quickly visualize data except... only the most advance user will manage to get it installed/working.

I wish it would be made available the same way tensorboard is. With tensorboard, it's as simple as typing a single command. The only easy alternative to get quickly going in Facet is to directly use the example demo at https://pair-code.github.io/facets/

jameswex commented 6 years ago

Check out the new section on using Facets in Google Colaboratory (a free jupyter-based notebook environment): https://github.com/PAIR-code/facets#usage-in-google-colabratoryjupyter-notebooks and the sample notebook here: https://colab.research.google.com/drive/1QrcuNHJnL3TBzcFV-0yw6y3wWmSWv_gM

You can use facets dive and overview on csv or pandas dataframes with no need for any installations. Just a few commands in the notebook.

shirareznik commented 6 years ago

Hi, I understand how to do everything with text or numeric data, but in what format I may add images to the csv? (i.e.- how may I implement something similar to the "draw" data- but show my own images instead of faces?) I want to add a "high dimensional" feature to the csv (some kind of an image).

Many thanks!

jameswex commented 6 years ago

The image data can't be added to the CSV, but with Facets Dive you can add an atlas image which contains thumbnail images for line in the csv, which is what we did for the draw data.

PR https://github.com/PAIR-code/facets/pull/137 is about to merge in a tool to help create this atlas image for any dataset you have, and then the atlas image can be provided to dive as the atlasUrl parameter as seen here: https://github.com/PAIR-code/facets/tree/master/facets_dive#sprite-properties