DesiPilla / DSPS_dPilla

Data Science for Physical Scientists - UDel 2019 (Course code PHYS467)
1 stars 2 forks source link

Data Science for Physical Scientists

In my senior year at the University of Delaware, I enrolled in the course Data Science for Physical Scientists (PHYS 467). The instructor was Dr. Federica Bianco. While the labs were completed in groups, I have personally learned all concepts demonstrated in each assignment. The class repository can be found here.

Labs

The homework consisted of the following labs:

  1. Understand the basics of GitHub.
  2. Exploring different statistical distributions and how to create them in python.
    • Normal, binomial, poisson, cauchy, and log-normal distibutions
    • NumPy package
    • Plotting using matplotlib.pyplot
  3. Recreate the data analysis from the PhD thesis 'Statistical Tests for Scaling in the Inter-Event Times of Earthquakes in California' by Alvaro Corral.
    • Data collection and cleaning
    • Pandas package
    • Broadcasting a panda dataframe
    • Data analysis
  4. Testing the Significance of Dark Matter
    • Uncertainty analysis / propogation of error
    • Astropy package (units)
    • Errorbars
    • In class exercise: Monte Carlo simulation
  5. Reidentifying Urban Information (using PLUTO database to identify owners who may be violating energy regulations)
    • drive module (from google.colab package)
    • Merging dataframes
    • Data wrangling
    • In class exercise: Unix commands, !wget, unzip
  6. Find evidence of the expansion of the universe by fitting supernova cosmology data. The linear relationship between logarithm redshift and luminosity of supernovae is this evidence.
    • SciPy curve fitting (optimize, curve_fit)
    • Sklearn fitting (LinearRegression)
    • Cross validation in Sklearn
  7. Test different models for the relationship between the density of electrons present in the conduction band of charge-neutral multilayers and temperature.
    • Statsmodels (ordinary linear fits / ols)
    • log-likelihood
    • emcee MCMC method
  8. Data visualization - MLB Batted Ball Hit Probabilities
  9. Analysis of Higgs Boson decay
    • Kaggle
    • Ensemble methods (RandomForest, GradientBoosted)
    • Confusion matrix
    • Feature selection
  10. Introduction to unsupervised machine learning
    • Whitening data
    • K-means clustering
    • Agglomerative clustering
  11. Analysis of TESS light curves for periodicity, variablity, etc.
    • Lomb Scargle periodigrams
    • Phase-folding
    • Binning
    • In class lab: clustering of time series
  12. Using TensorFlow to recognize hand-written numbers and an exploration of DeepDream
    • Creating/training/predicting with Neural Networks in TensorFlow
    • Producing DeepDream images
  13. Gradient Descent Demo
    • Gradient Descient
    • Loss function