IBM / analyze-customer-data-spark-pixiedust

An introductory IBM Developer Code Pattern on how to use PixieDust to visualize customer data
https://developer.ibm.com/patterns/analyze-historical-shopping-data-spark-pixiedust-jupyter-notebook/
Apache License 2.0
14 stars 22 forks source link
pixiedust python spark visualization

Analyze customer data using Jupyter notebooks, Apache Spark, and PixieDust

In this code pattern historical shopping data is analyzed with Spark and PixieDust. The data is loaded, cleaned and then analyzed by creating various charts and maps.

When you have completed this code patterns, you will understand how to:

The intended audience is anyone interested in quickly analyzing data in a Jupyter notebook.

Flow

arch

  1. Log in to IBM Watson Studio
  2. Load the provided notebook into Watson Studio
  3. Load the customer data in the notebook
  4. Transform the data with Apache Spark
  5. Create charts and maps with PixieDust

About the data

Included Components

Steps

  1. Create a project
  2. Create a notebook
  3. Load customer data in the notebook
  4. Transform the data with Apache Spark
  5. Create charts and maps with PixieDust

1. Create a project and add the Spark services

2. Create a notebook

3. Load customer data in the notebook

4. Transform the data with Apache Spark

Before analyzing the data, it needs to be cleaned and formatted. This can be done with a few pyspark commands:

5. Create charts and maps with PixieDust

The data can now be explored with PixieDust:

notebook

Histogram notebook

Map notebook

Related links

Learn more

License

This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.

Apache Software License (ASL) FAQ