TeachTechTaskForce / edumap

9 stars 12 forks source link

Normalize Scraped Data #82

Open hwayne opened 8 years ago

hwayne commented 8 years ago

Right now our scrapers all use different formats:

Without a universal format for the scraped data, we have to write a separate importer for each set of scraped data. This is harder to maintain and leads to problems like, for example, #81. In that case, because of the additional complexity of multiple importers, we missed the fact we weren't seeding the database with description or time. In addition, each importer had to be separately tweaked to include the data.

It's not necessary for each scraper to return data in the same order. It is, however, highly beneficial for each to have a header and use the same keywords in each csv header. That way, instead of writing Code.create(identifier: result[4]), we could write Code.create(identifier: result.code), using one (well-tested) import method for all possible sources.