CredentialEngine / ai-course-crawler

Apache License 2.0
1 stars 0 forks source link

Texas A&M International University #28

Open rvilsack opened 3 months ago

rvilsack commented 3 months ago

The Texas A&M International University CSV file has the following issues.

AI Course Crawler Extract link: https://master.ai-course-crawler.development.c66.me/datasets/courses/15

Extract file in Google Sheets with comparison to expected: https://docs.google.com/spreadsheets/d/1T-R2UudsM2KdzZtQHX5Xrvo8y8lr4ON6lz6ZJ7L1sME/edit?usp=sharing

The information below is applicable to the majority of courses with this data set.

ALL ISSUES

EXPECTED, see highlighted row in the "Comparison to Correct Bulk Upload" tab in this spreadsheet

Using ARTS 3330 18th & 19th Century Art course as example (it's the example highlighted in the Google sheet linked above)

Here is what was listed for this course in the URL provided to the crawler: Screenshot 2024-08-15 111223

Here is what was listed for this course in the URL included as InCatalog for ARTS 3330: Screenshot 2024-08-16 101848

rsaksida commented 2 months ago

As with other issues, I was able to improve extraction with some changes to recipe detection.

However, even with a correctly configured recipe, the catalogue URL isn't yielding very good results. It looks like that catalogue management system is a little tricky because better data about the courses does exist (see for example ACC 2301 which I found by using the search page), but an index for the data isn't easy to access.

Using that link I found by searching the website, I managed to find an index here but I'm not sure whether that's being linked to anywhere.

I added a configuration for the more complete index here and started an extraction which seems to yield much better results as there's more data to work with.

When that extraction finishes, please check it out - we can keep using this issue.

For the course requisites see #43.