PikaCourse / homiehomie

3 stars 0 forks source link

[Idea] Automate course crawling via worker process #112

Closed William-An closed 3 years ago

William-An commented 3 years ago

Idea: Course info automation

What is this idea related to?

backend

Is your feature request related to a problem? Please describe.

  1. Currently, the course crawling requires writing a crawler, deploy a crawler, and upload information to database
  2. Which is not ideal as:
    1. Crawling and uploading can take hours
    2. Human error involved in crawling and uploading
    3. Crawlers for different schools are nearly identical with only some minor changes

Describe the solution you'd like

  1. We can create a generic interface that calls on different school query API upon user request #85 (called course refreshed)
    1. The crawler should not access the school query API every time a user searches for a course but should have cold down time
    2. For instance, if a user accesses the course within a day, the next user access should not trigger course refresh
    3. This interface should be run as a worker process since it needs to access third-party service and might take a long time
    4. Some potential issues with multiple workers when scaling up
      1. Implement lock? Since the newest data is the best
    5. Still we need to have a wrapper function around each school query API to let the main dispatcher use upon different schools course refresh
    6. We will also need to have Unittest for these wrapper functions to make sure that the output they generate are as what we expected.
      1. Can use a small set of expected data outcome to test against the wrapper function
  2. For schools without the query API, like UNC, use a cronjob to update course section information weekly?
William-An commented 3 years ago

Backend query API wrapper for BU