cclatterbuck / CAplastics

exploring patterns in CA's plastics data from cleanup efforts
3 stars 2 forks source link

OC data cleaning: people vs. counts #5

Open cclatterbuck opened 1 year ago

cclatterbuck commented 1 year ago

Need some ideas from collaborators regarding cleaning the OC (Ocean Conservancy) dataset.

In normalizing counts from the raw dataset (filtered to the date of Coastal Cleanup Days), I noticed the range of total number of People and Adults per cleanup (row) ranged from 0-12620. When cleanups without any count data were excluded, this range lessened to 0-9600. I also noticed that some of the cleanup efforts with large numbers of people (>1000 people; e.g., Cleanup IDs 17836, 34455, 34464, 34469) only collected a single trash item. These appear to be issues with data entry and, in my opinion, can be excluded.

Potential decisions to make, with the goal of cleaning the data as much as reasonable:

cclatterbuck commented 1 year ago

Problem explored & potentially solved in explore_outliers_OCdata.qmd, with pdf file output. Would like collaborator input before closing the issue