LucaCanali / Miscellaneous

Includes notes on using Apache Spark in general, notes on using Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark, tools for performance testing CPUs, Jupyter notebooks examples for Spark, examples for Oracle and other DB systems.
Apache License 2.0
424 stars 147 forks source link
apache-spark database jupyter-notebooks performance-analysis performance-monitoring performance-testing

Miscellaneous projects and scripts.

Author and contact: Luca.Canali@cern.ch

Spark and Performance Engineering

Folder Description
Spark Dashboard A tool for Apache monitoring, use to build a performance dashboard and troubleshoot Spark jobs.
Spark Notes Miscellaneous tips and code snippets about Apache Spark.
Spark for Physics Examples, with code and data of how Apache Spark can be used in the domain of High Energy Physics data analysis.
Performance Testing Code and examples, includes:
- A tool to run TPCDS at scale with PySpark and collect execution metrics
- Tools for load-testing CPUs in writetn Python and Rust
- Notes on how to use tooling for performace measurements

Data Engineering and Data Science

Folder Description
Deep Learning Notes Notes and examples on Deep Learning tools and related data pipelines.
Pyspark_SQL_Magic_Jupyter How to write Jupyter SQL magic functions for PySpark and Spark SQL.
Trino and Presto on Jupyter Example of using Trino or Presto on a Jupyter notebook.
PostgreSQL and YugabyteDB on Jupyter Example of using PostgreSQL or YugabyteDB on a Jupyter notebook.
Oracle_Jupyter Examples of how to query Oracle using Jupyter/IPython notebooks.
Impala_SQL_Jupyter Examples of how to run SQL on Apache Impala using Jupyter/IPython notebooks.
SQL_color_Mandelbrot How to use SQL to compute and display the Mandelbrot set with colors. Examples for Oracle and PostgreSQL.
PLSQL_Neural_Network An example of how to deploy a DL serving engine for Oracle using PL/SQL.