CivicActions / edscrapers

US Department of Education Data Scraping Kit; see https://us-ed-scraping.ckan.io/dataset
GNU Affero General Public License v3.0
15 stars 9 forks source link

Sanitising Datasets #109

Closed osahon-okungbowa closed 4 years ago

osahon-okungbowa commented 4 years ago

SITUATION

Based on investigation & client feedback from previous scrapping runs, there are datasets being scraped that need to be treated/sanitized. Sanitising methodology may vary based on the nature of the dataset e.g. datasets with 'photo; in title are to removed; datasets with 'conference' in title are to be tagged as 'private' and also set as private etc.

TASKS

ACCEPTANCE CRITERIA

PROBLEM LIST FOR SANITISING

Daniellappv commented 4 years ago

estimate: 8h