issues
search
PikaCourse
/
homiehomie
3
stars
0
forks
source link
[Idea] Automate course crawling via worker process
#112
Closed
William-An
closed
3 years ago
William-An
commented
3 years ago
Idea: Course info automation
What is this idea related to?
backend
Is your feature request related to a problem? Please describe.
Currently, the course crawling requires writing a crawler, deploy a crawler, and upload information to database
Which is not ideal as:
Crawling and uploading can take hours
Human error involved in crawling and uploading
Crawlers for different schools are nearly identical with only some minor changes
Describe the solution you'd like
We can create a generic interface that calls on different school query API upon user request #85 (called course refreshed)
The crawler should not access the school query API every time a user searches for a course but should have cold down time
For instance, if a user accesses the course within a day, the next user access should not trigger course refresh
This interface should be run as a worker process since it needs to access third-party service and might take a long time
Some potential issues with multiple workers when scaling up
Implement lock? Since the newest data is the best
Still we need to have a wrapper function around each school query API to let the main dispatcher use upon different schools course refresh
We will also need to have Unittest for these wrapper functions to make sure that the output they generate are as what we expected.
Can use a small set of expected data outcome to test against the wrapper function
For schools without the query API, like UNC, use a cronjob to update course section information weekly?
William-An
commented
3 years ago
Backend query API wrapper for BU
Idea: Course info automation
What is this idea related to?
backend
Is your feature request related to a problem? Please describe.
Describe the solution you'd like