JoshusTenakhongva / Mentorship_Repo

1 stars 0 forks source link

C. Questions for Chris #13

Closed JoshusTenakhongva closed 2 years ago

JoshusTenakhongva commented 2 years ago

Questions for Chris in Comments

CLuiz commented 2 years ago

Good work on the doc, I think it is a great starting point. I'll come to our next meeting with some ideas.

  1. Let's target ~$135k per year total comp for now. My confidence comes from the four folks I've placed so far this year for between $100k - $130k salary, and TC in the $120k-$150k range.
  2. We will talk through this point continuously. Here is a good reference to get started
  3. Regarding getting started, we can start from data set or from the infra. Up to you. Maybe look for a data set that is available on the snowflake data exchange? samples
JoshusTenakhongva commented 2 years ago

Thanks! I'm gonna start this session looking at these

JoshusTenakhongva commented 2 years ago

Hi, Chris I've been doing more research into the relationship between Data Engineers and Data Scientists, ML Engineers, and Business folk to better understand exactly what makes a good pipeline and good data. This has since inspired in me a few questions Also, I usually don't know exactly how to phrase the question I'm asking, so I'll just write multiple questions to get at what info I'm looking for. It's mostly the vibe of the question, and if you need me to elaborate, please let me know.

JoshusTenakhongva commented 2 years ago

Addendum:

CLuiz commented 2 years ago

Hi, Chris I've been doing more research into the relationship between Data Engineers and Data Scientists, ML Engineers, and Business folk to better understand exactly what makes a good pipeline and good data. This has since inspired in me a few questions Also, I usually don't know exactly how to phrase the question I'm asking, so I'll just write multiple questions to get at what info I'm looking for. It's mostly the vibe of the question, and if you need me to elaborate, please let me know.

  • [ ] Under what circumstances are companies usually hiring Data Engineers? What is the mentality of a manager hiring when they're looking for new Data Engineers? Generally, what kind of tasks will be required of me once I hit the ground floor?
  • [ ] How do Data Scientists/ML Engineers/Business folk interact with the data? Are they accessing the database directly with SQL and Python tools? Are they accessing a section of the data that's been curated? Is most of the data backlogged in a data lake, holding potential but not necessarily being accessed all the time?
  • [ ] Do I need to know anything about Machine Learning other than, "It needs good data". Do I need to know anything about statistics and DS algorithms other than, "The data needs to be accessed easily and consistently."

Folks that are hiring data engineers want people that have worked at the border between data and software engineering. The use cases are very broad. Typciall tasks are writing pipelines to do data transformations/etl work or setting us some tool to do so (airflow, etc).

DS folks interact with the data in every way you can imagine - it is very inconsistent across companies. Typically, someone pulls the data and transforms it into something they can use for modeling. If you are a DE working with a DS, the data acquisition and (at least) some of the transformation work would be where you fit.

Don't worry about the ml stuff for now. We can talk more about that when we next meet.

Addendum:

  • [ ] The position we're going for is not necessarily a senior one, right? As far as my understanding, senior DEs architect and design the pipeline. A junior DE is more like a construction worker, putting it together and making it work. I don't need to intimately know every tool out there, I just need to know that the tools are there, what the categories are, and familiarize myself enough to be able to pivot. I need to basically build a shed to show that I know how the tools work and the process. It's just that the more I learn about Data Engineering, the more I know how little I know, and I wanna make sure I'm trudging through the info right to land the position.

Ha, yeah, it is a very deep field. You won't be going for senior roles, so, yes, I want you familiar with common tooling and to have stories about how you used the tools, but it is 100% you don't know everything. Don't worry about data strategy and high level architectural decisions. Your focus should be something like: "can I make airflow work on AWS, and can I get my python to run on it" vs "what is the optimal data processing and cloud infra for data that has XX size, YY velocity, and ZZ characteristics".