Need some ideas from collaborators regarding cleaning the OC (Ocean Conservancy) dataset.
In normalizing counts from the raw dataset (filtered to the date of Coastal Cleanup Days), I noticed the range of total number of People and Adults per cleanup (row) ranged from 0-12620. When cleanups without any count data were excluded, this range lessened to 0-9600. I also noticed that some of the cleanup efforts with large numbers of people (>1000 people; e.g., Cleanup IDs 17836, 34455, 34464, 34469) only collected a single trash item. These appear to be issues with data entry and, in my opinion, can be excluded.
Potential decisions to make, with the goal of cleaning the data as much as reasonable:
[x] Remove cleanups with 0 people
[x] Remove cleanups with improbable numbers of people (help determining this)
[x] Remove cleanups with a single item collected (& more?)
Need some ideas from collaborators regarding cleaning the OC (Ocean Conservancy) dataset.
In normalizing counts from the raw dataset (filtered to the date of Coastal Cleanup Days), I noticed the range of total number of
People
andAdults
per cleanup (row) ranged from 0-12620. When cleanups without any count data were excluded, this range lessened to 0-9600. I also noticed that some of the cleanup efforts with large numbers of people (>1000 people; e.g.,Cleanup ID
s 17836, 34455, 34464, 34469) only collected a single trash item. These appear to be issues with data entry and, in my opinion, can be excluded.Potential decisions to make, with the goal of cleaning the data as much as reasonable: