CredentialEngine / ai-course-crawler

Apache License 2.0
1 stars 0 forks source link

Extract missing courses for course catalog vendor (Smart Catalog) #57

Open rvilsack opened 2 weeks ago

rvilsack commented 2 weeks ago

I'm testing a new course catalog vendor - Smart Catalog

Their pages have a consistent look with subject headings that link to courses related to that subject. Configuration + extraction was successful for 2 of the 3 sites tested. Successful extractions included just a small number of courses from 1 of the subject links.

See examples below.

Bethel University- https://betheluniversity.smartcatalogiq.com/en/2024-2025/catalog/undergraduate-courses/ image Configure + extraction successful: https://master.ai-course-crawler.development.c66.me/extractions/54 Link to Google extract: https://docs.google.com/spreadsheets/d/1h_MGuqiAb_qovUNUYwVxygiZtkdho04Z9Fts8ubMfDM/edit?usp=sharing Extracted 4 courses from 1 link.

United States University - https://usuniversity.smartcatalogiq.com/en/current/general-catalog/courses/ image Configure + extraction not successful: https://master.ai-course-crawler.development.c66.me/catalogues/46

Strayer University - https://strayer.smartcatalogiq.com/en/2024-2025/catalog/courses/ image Configure + extraction successful: Extracted 60 courses (out of 86 listed) from 1 link: https://docs.google.com/spreadsheets/d/1ifFK5OqVQA92hj0zHqaJmKPTuA3MILCLvAlB7pamQEE/edit?usp=sharing

rsaksida commented 1 week ago

Thanks. The navigation structure looks pretty simple - I'll dig into it to figure out why it's not working.