Normalize Scraped Data - Githubissues

Right now our scrapers all use different formats:

Bootstrap world returns url,standard_code,standard_desc,index,title,description,length
CS First is URL,Curriculum,Lesson,Description
Code.org doesn't have a header at all

Without a universal format for the scraped data, we have to write a separate importer for each set of scraped data. This is harder to maintain and leads to problems like, for example, #81. In that case, because of the additional complexity of multiple importers, we missed the fact we weren't seeding the database with description or time. In addition, each importer had to be separately tweaked to include the data.

It's not necessary for each scraper to return data in the same order. It is, however, highly beneficial for each to have a header and use the same keywords in each csv header. That way, instead of writing Code.create(identifier: result[4]), we could write Code.create(identifier: result.code), using one (well-tested) import method for all possible sources.

TeachTechTaskForce / edumap

Normalize Scraped Data #82