Update Data Eng. Interview

joelachance commented 1 year ago

Let's do a couple of things here:

[ ] I want to see a candidate manage to compare two different data types. Let's make sure we know what data types we expect from each column, and fix any columns where the datatype is different than expected, with the exception of the column we want coerced.
[ ] Update data to duplicate return rows so they have an associated purchase row. Some of these rows should reflect different sizes to denote a different size was returned.
[ ] Remove any take home assignment mentions, as in how to turn in the work. The majority of these should be in person, and we can shorten the prompt.
[ ] We currently have 3-4 data engineering assignments-- we should only have one.

ChironBM commented 1 year ago

Pre Technical Assessment Test Communication

Given that interviewee's can best demonstrate their skills when uploading the data into a DB, we want to give them a short heads up that gives an indication of what the TAC will encompass and what we expect from them. We won't tell them (how) to do anything, but it gives them the opportunity to prepare something from their side they think will best help them do the assignment (e.g., Jupiter notebook or a way to ingest the data using python and then query it later.)

If an applicant does not prepare anything that is fine too, however it would take away from their 'time' in the interview to demonstrate and answer the questions. At the same time, based on the document we would expect them to understand what tools to use in the interview.

DE Interview - Pre TAC Communication.docx DE Interview - Pre TAC Communication.pdf

joelachance commented 1 year ago

This is great, you're a pro at good-looking docs :) Some thoughts:

As much as I love how this looks, I think we should probably have this in a format that we can version control better. The thought being we'd be emailing a potential candidate anyway, being able to update this easily would be ideal, and we should store this in this repository, anyway.

The take-away I got from this document was that the candidate should be expecting to load a csv file into a database to run queries on that data, correct? Maybe this becomes the start of an email template (we have something in Greenhouse already for this, so we could formalize what we want to say and put it in Greenhouse for automation) and that is always sent out.

It's interesting to me how many candidates struggle with loading data, it'll be good to see what they do. I'm always struck with a person's development flow, if they know how to use git, etc., which we can discover in the first few minutes.

Here's next steps, as I see it:

Review the dataset. @ChironBM , I know you came across data you weren't expecting in the previous interview, and it makes sense you know where all of the 'gotchas' are. I liked your suggestion of duplicating rows so we have a matching purchase/return rows, for example.
Separate the Python/SQL portions. I can work on the Python bit of this of course-- we have a Python interview, so if we expect each section to take roughly 30 mins, we can plan accordingly there.

One note with the data-- I do think it is worthwhile to give them something they're not entirely expecting in the data (non-matching data types, etc). My goal is to see what they do. I don't think a mistake is disqualification at all, but I'm very interested in what someone does when they encounter resistance/errors/etc. This tells me a lot about a candidate and what they might do.

ChironBM commented 1 year ago

Ohh yes including it in greenhouse would be the best way, great suggestion! Automation FTW haha

The take-away I got from this document was that the candidate should be expecting to load a csv file into a database >to run queries on that data, correct?

Yes correct. Ideally the candidate will prepare the right tools / clients to do this during the interview for both Python and SQL. Should make the interview more efficient and focussed towards answering the questions rather then spending time 'prepping'.

I've updated the excel data in the doc that is attached. The questions and answers are there, as well with the expected answers. On top of that I also included some of these 'quirks' such as products returned before purchase, empty price fields and we have the different data type in the height column (cm vs inch). Also the returns are now matched to and included in purchases so that there is duplicate transaction_ids. Interview Answers.xlsx

I'll update the description on GitHub and provide some extra information to make it clear what we are expecting and judging on.

Let me know what you think.

joelachance commented 1 year ago

This is really great! Much better 'quirks' as well, thank you @ChironBM :) I'll watch for a PR/Github update, let me know if you need any help with Github. Thanks again!

joelachance commented 1 year ago

[ ] Update the data_eng dir name
[ ] Update the data for both user_telemetry & purchase_return datasets.

joelachance commented 1 year ago

Closing, I think we're good to go here. Thanks, @ChironBM !

BoldMetrics / interview

Update Data Eng. Interview #10

Pre Technical Assessment Test Communication