calpoly-csai / csai-scraping

Web scraping for Nimbus
4 stars 4 forks source link

Change output of individual scrapers to JSON #11

Open cameron-toy opened 4 years ago

cameron-toy commented 4 years ago

Currently, each module outputs a csv string with just the data. Making that data one field in a JSON string with errors, timestamps, and other metadata in the others would allow for better logging and error handling.

austinsilveria commented 4 years ago

Had a conversation with the data team yesterday and we will be storing the data in the database through SQLAlchemy object mapper classes. We ran through an example for storing an AudioSampleMetaData object which can be seen here: https://github.com/calpoly-csai/api/pull/35

Our use case is very similar to what was done in this PR, so I imagine we will be building JSON representations of each scraped object (Course, Club, ...) so it can be mapped to its respective SQLAlchemy entity (Courses source code).

The solution to logging posed in this issue would be great for integrating with how we are going to store the data. We could build the wrapped object as follows, then save the List[Course] in bulk through one API call:

CoursesData {
    errors: List[Error]
    timestamp: Timestamp
    data: List[Course]
}