Closed Humorloos closed 2 years ago
High Priority
In "Results of Data Translation" Phase: Most of our field mappings are 1-to-1 translations, split based string to "list" conversion. So, we can just create a new column in the integrated schema for explanation, instead of a detailed write up.
"How did you create the consolidated schema?": We did it via Pandas, so we are good w/ that, & we can add some explanations for it.
Our code currently is in scrambled form with multiple main classes specific to each task, so would the code in the format be fine. or should in the final iteration consolidate it.---
"Group size distribution:" cluster size 1 and 2 means, is it after combining d1+d2+d3, -> cluster size 1 ideally because unique movie level data.
"How should we interpret the 'consistency' metric in data fusion stage ?"
Medium Priority
What's the well defined scope of "something cool" ? Is there a language, framework or methodology restriction in it? Are there extra marks for it? [provided everything else is correct.]
"overall accuracy": of the final dataset meaning exactly what?
We have to make a presentation or we can do a walk-through of report tables? What's the content limit on that?
Ownership has to be added only in presentation or report too.
Very Low Priority
Early rough draft feedback is possible before the deadline, let's say in the next coaching session? Whom should the report be exactly addressed to? as the template said dws group?
TODO
Collate doubts regarding remaining tasks in identity resolution and data fusion for coaching session.
Deadline: 11-24