(Almost) everything you need to know as an applied mathematician / statistician concerning coding and system administration.
with the help of some students including:
https://github.com/bcharlier/HMMA238
https://github.com/HMMA238-2021/
Students are expected to know basic notions of probabilities, optimization, linear algebra and statistics for this course. Some rudiments on coding is also expected (if, for, while, functions) but not mandatory.
This course focuses on discovering good coding practices (the language used being Python, but some element of bash and git will also be useful) for professional coding.
A special focus on data processing and visualization will be at the heart of the course.
We will mostly focus on basic programming concepts, as well as on discovering the Python scientific libraries, including numpy, scipy, pandas, matplotlib, seaborn
.
Beyond pandas
ninja skills, we will also introduce modern practices for coders : (unitary) tests, version control, documentation generation, etc.
BC : (20/01) Introduction to linux essentials and command line tools: bash,
BC : (21/01) Introduction to linux essentials and command line tools: regexp, grep, find, rename,
BC : (27/01) IDE: VScode, Python virtual env: Anaconda, Python virtual environment, Git: a first introduction, github
, ssh key creation, various git commands, conflict, pull request; see also Bonus/
JS : (28/01) Coding : algorithms, modules, basic types, functions, loops coding : list, dictionary, tuples, if statement and loops, exceptions
BC : (03/02) hands on git
JS : (04/02) numpy
: basics on matrices (arrays), slicing, simple linear algebra, masking; matplotlib
: first plots
BC : (10/02) Some git again,
JS : (11/02) numpy
: casting, concatenation;
JS : (17/02) numpy
/ matplotlib
: imshow
, meshgrid
, copy;
JS : (18/02) scipy
: EDO, Interpolation, Optimize
BC : (03/03) classes (__init__
, __call__
, etc...), operator overloading, files handling, Create a Python Module
JS : (04/03) scipy
: Images/channel, Pandas: first steps / missing data
BC : (10/03) Create a Python Module
BC : (17/03) unit tests
JS : (18/03) Sparse matrices,
JS : (31/03) graphs and memory
JS : (01/04) Numba, parallelism
BC : (07/04) Documentation with Sphinx
JS : (08/04) Statsmodels
JS-BC : (19/04) Oral examination
JS-BC : (20/04) Oral examination
A small challenge based on a real datasets. This will be a personal work, and includes an aesthetic part and prediction part.
Due date : 23:59 Thursday, April 1st.
More information on the challenge is available at Challenge 2020-2021
Three short tests of 15 min each (on Moodle). This will be a personal work.
Warning: the precise details of the projects might evolve before the allocation phase, and a precise grid will be given in the project section.
Warning: the project repository must show a balanced contribution between group members and intra-group grades variation could be made to reflect issues on the intra-group workload balance.
1 supplementary point on the final grade of the course can be obtained for contributions improving the course material (practicals, Readme, etc.). See the Bonus section for more details on how to proceed.
The resources for the course are available on the present github
repository. Additional elementary elements (in French) on Python are available in the course HLMA310 and the associated lectures notes IntroPython.pdf.
(General) : The Missing Semester of Your CS Education
(Data Science) : J. Van DerPlas, Python Data Science Handbook, With Application to Understanding Data, 2016https://jakevdp.github.io/PythonDataScienceHandbook/
(General) Skiena, The algorithm design manual, 1998
(General) Courant et al. , Informatique pour tous en classes préparatoires aux grandes écoles : Manuel d'algorithmique et programmation structurée avec Python, 2013, (french)
(General/data science) Guttag, Introduction to Computation and Programming, 2016
Associated videos: http://jakevdp.githubio/blog/2017/03/03/reproducible-data-analysis-in-jupyter/
(Code and style) Boswell et Foucher, The Art of Readable Code, 2011
(Scientific computing tools for Python) http://www.scipy-lectures.org/
(Visualization) http://openclimatedata.net/
Some useful extensions:
conda install -c conda-forge jupyter_contrib_nbextensions
conda install -c conda-forge nbstripout