Data Science for Physical Scientists
In my senior year at the University of Delaware, I enrolled in the course Data Science for Physical Scientists
(PHYS 467). The instructor was Dr. Federica Bianco. While the labs were completed in groups, I have personally
learned all concepts demonstrated in each assignment. The class repository can be found here.
Labs
The homework consisted of the following labs:
- Understand the basics of GitHub.
- Exploring different statistical distributions and how to create them in python.
- Normal, binomial, poisson, cauchy, and log-normal distibutions
- NumPy package
- Plotting using matplotlib.pyplot
- Recreate the data analysis from the PhD thesis 'Statistical Tests for Scaling in the Inter-Event Times of Earthquakes in California' by Alvaro Corral.
- Data collection and cleaning
- Pandas package
- Broadcasting a panda dataframe
- Data analysis
- Testing the Significance of Dark Matter
- Uncertainty analysis / propogation of error
- Astropy package (units)
- Errorbars
- In class exercise: Monte Carlo simulation
- Reidentifying Urban Information (using PLUTO database to identify owners who may be violating energy regulations)
- drive module (from google.colab package)
- Merging dataframes
- Data wrangling
- In class exercise: Unix commands, !wget, unzip
- Find evidence of the expansion of the universe by fitting supernova cosmology data. The linear relationship between logarithm redshift and luminosity of supernovae is this evidence.
- SciPy curve fitting (optimize, curve_fit)
- Sklearn fitting (LinearRegression)
- Cross validation in Sklearn
- Test different models for the relationship between the density of electrons present in the conduction band of charge-neutral multilayers and temperature.
- Statsmodels (ordinary linear fits / ols)
- log-likelihood
- emcee MCMC method
- Data visualization - MLB Batted Ball Hit Probabilities
- Analysis of Higgs Boson decay
- Kaggle
- Ensemble methods (RandomForest, GradientBoosted)
- Confusion matrix
- Feature selection
- Introduction to unsupervised machine learning
- Whitening data
- K-means clustering
- Agglomerative clustering
- Analysis of TESS light curves for periodicity, variablity, etc.
- Lomb Scargle periodigrams
- Phase-folding
- Binning
- In class lab: clustering of time series
- Using TensorFlow to recognize hand-written numbers and an exploration of DeepDream
- Creating/training/predicting with Neural Networks in TensorFlow
- Producing DeepDream images
- Gradient Descent Demo
- Gradient Descient
- Loss function