-
- Data visualisation: Matplotlib, Seaborn, Plotly, or Tableau
- Statistical analysis: Python libraries (e.g., NumPy, SciPy, StatsModels) or R
-
-
Gain basic insight about the dataset and its nature:
- Size
- perform data cleaning
- Generate statistics and visuals
- Identify features
- Data transformation if needed
-
- [x] similarity by title
- [x] non similarity by title
- [x] similarity by image
- [x] non similarity by image
- [x] simple observe to look up at image if the title seems highly not similar but m…
-
Analyze a dataset to find patterns, trends, and outliers to derive
meaningful insights.
-
This might include some questions and comments for @jaklinger and @russwinch
-
- [ ] How many high distress conversations did we miss?
- [ ] Do highly distressed kids drop off more? Or wait longer?
- [ ] How often do kids try and chat before the conversation has started?
- [ …
-
-
EDA of the enriched MAG data. Some goals:
1. Spot and report data collection bugs so that we can fix them.
2. Identify data gaps (and bonus points for proposing what else needs to be collected!).
3…
-
**Prompt 2 datasets**
- [x] `account_000.txt`
- `Expecting 23 cols, but line 858618 contains text after processing all cols.`
- Manually skipped in `pandas` and re-exported for R
- [x] `acti…