-
We need to clean the data, i.e. tokenization
# Defition of Done (DoD)
Tokenized dataset (.csv)
Tokenizer
-
-
-
I have added a data cleaning branch to organise adding different cleaning scripts and code. As more cope and datafiles get added, it would be worth creating a single Jupyter notebook that walks the re…
-
- Starbucks data should be limited to the US region
- Drop unnecessary columns in both Starbucks and US datasets
(any variables that do not include information on income or location)
- Drop rows w…
-
When scraping will be completed. the next step is data cleaning:
- are there MR numbers with empty citations? Why
- missing?
- not appearing on MSN yet? (what would be the optimal way to g…
-
### 1. Defining who is immigrant
**Card:**
Defines as immigrant people who were naturalized citizen or who are still not citizens.
```
citizen='0=us born 1=nat 2=not cit 3=born abroad us paren…
-
### Dependency
- [ ] obtain 2024 or most recent Boundaries JSON file
### Overview
We need to remove 311 Data service requests on our map that do not fall within the boundaries of a Neighborhood C…
-
-
- [x] Clean and combine 50 juvenile arrest files into one dataframe @cschubert29
- [x] Inspect and clean funding files as needed (districts.csv, states.csv.) @lsantana91
- [x] Inspect and clean st…