datastacktv / data-engineer-roadmap

Roadmap to becoming a data engineer in 2021
https://datastack.tv
12.19k stars 1.3k forks source link

Supporting reading material #66

Open ysgurjar opened 2 years ago

ysgurjar commented 2 years ago

I am a complete beginner who decided to follow the roadmap couple of months ago. Sharing a few books that helped me to get started.

  1. How computer works : Code: The Hidden Language of Computer Hardware and Software Link
  2. How internet works: Introduction to Networking Link
  3. API : An Introduction to APIs by Brian Cooksey, Stephanie Briones (Illustrator), Danny Schreiber Link

I am a self learner who is looking forward to receiving further support on next steps.

jamiros commented 2 years ago

That's awesome! Thank you for sharing that!

joseluistello commented 2 years ago

Documenting APIs: A guide for technical writers and engineers This is an excelent material too

ysgurjar commented 2 years ago

Thank you. @alexandraabbas and other folks, I am struggling to find a good resource for data structure and algorithms, Linux, serialisation. Additionally, I am not sure how much time I should be spending on each of these? There aren't any courses on data stack at this level. Suggestions?

datatalking commented 1 year ago

@ysgurjar it really depends upon where your skills are in terms of the interval of total skills as data engineering covers a wide swath of technology and experience level. Most of my work involves more scientific processing of data so I use linear algebra and matrix equations almost weekly. I'm looking at a book on my shelf and have eight books that I bought but really only use probably three or four.

  1. I've been using 'Data Engineering for python' book and found it helps me. What language do you use @ysgurjar ?
  2. 'Data Structures and Algorithms Made Easy' by Narasimha which is 400+ pages and written in C so I have a friend I bribe to translate enough to python so I can grok it.
  3. 'Methods of Multivariate Analysis' or also known as 'Rencher' book is a deep dive into almost all of the algebra used in everything from NLP, ML and DL. So the Rencher book that many seem to love but its an advanced read.
  4. 'Intro to Algorithms' I had good luck with a friend and I who did together with me and she helped translate concepts from the 1,300 pages seems to solving problems so its a deep resource for me.
  5. If you are going to do the algebra it computes the stats and I had luck with 'Pearson Stats' and 'Introduction to Statistical Methods and Data Analytics' 7th edition, by Ott and Longnecker
  6. The Duke University open sourced i think all of their classes similar to MIT did so there is a wealth of data. Part of my MATH342 the professor recommended 'Introduction to Modern Statistics' by Mine Çetinkaya-Rundel and Johanna Hardin
sarahgetter commented 1 year ago

@ysgurjar Thanks for your list! I heartily endorse these O°Reilly books:

  1. 'Fundamentals of Data Engineering' by Joe Reis and Matt Housely
  2. 'Practical Statistics for Data Scientists' by Peter Bruce, Andrew Bruce and Peter Gedeck
  3. 'Data Science from Scratch' by Joel Grus
  4. 'Creating a Data-Driven Organization' by Carl Anderson
  5. 'Beautiful Visualization' by Julie Steele and Noah Iliinsky

    I have thoroughly enjoyed 'Introduction to Design and Analysis of Experiments' by George W. Cobb, but would say this falls more into the realm of data science than data engineering.

    'Beautiful Visualization' might feel outside of the data engineering umbrella, too, but helped me understand the use cases for different levels of time granularity, as it relates to how to best represent patterns and trends. This helped me decide when my materialization layers should offer up millisecond-level granularity, or when there is no need for per-event data, and the smallest period rollup can be a day. This book was also quite helpful for stepping into an "is this the most usable version for my tableau-utilizing analysts" perspective and stepping outside of my optimization-obsessed engineering perspective.