Bootstrap world returns url,standard_code,standard_desc,index,title,description,length
CS First is URL,Curriculum,Lesson,Description
Code.org doesn't have a header at all
Without a universal format for the scraped data, we have to write a separate importer for each set of scraped data. This is harder to maintain and leads to problems like, for example, #81. In that case, because of the additional complexity of multiple importers, we missed the fact we weren't seeding the database with description or time. In addition, each importer had to be separately tweaked to include the data.
It's not necessary for each scraper to return data in the same order. It is, however, highly beneficial for each to have a header and use the same keywords in each csv header. That way, instead of writing Code.create(identifier: result[4]), we could write Code.create(identifier: result.code), using one (well-tested) import method for all possible sources.
Right now our scrapers all use different formats:
url,standard_code,standard_desc,index,title,description,length
URL,Curriculum,Lesson,Description
Without a universal format for the scraped data, we have to write a separate importer for each set of scraped data. This is harder to maintain and leads to problems like, for example, #81. In that case, because of the additional complexity of multiple importers, we missed the fact we weren't seeding the database with description or time. In addition, each importer had to be separately tweaked to include the data.
It's not necessary for each scraper to return data in the same order. It is, however, highly beneficial for each to have a header and use the same keywords in each csv header. That way, instead of writing
Code.create(identifier: result[4])
, we could writeCode.create(identifier: result.code)
, using one (well-tested) import method for all possible sources.