Open rvilsack opened 1 month ago
@rsaksida Re: your slack question
_Can you help me understand how the credits are supposed to be parsed? CRT HRS:4 LEC HRS:4 LAB HRS:1 OTH HRS:0 I assume this means Credit hrs - 4 Lecture hrs - 4 Lab hrs - 1 Other hrs - 0 How would this translate to the 4 columns we're using: Credit Unit Value Credit Unit Max Value Credit Unit Type Credit Unit Type Description__
All of the MATH courses (row 547-563 in the download file) assumed a course unit type = semester hour, when this was not the case for other courses. (Nothing on the MATH URL suggests semester, which is what I thought you'd adjusted the crawler to look for.)
See a constrasting example for instance row 546:
Here is what the URL for that course displays:
Similar display of data between these two examples, yet 546 was parsed correctly: Credit Unit Value = 4 Credit Unit Max Value Credit Unit Type Credit Unit Type Description = This has credit value, but the type cannot be determined
I expected the all of the MATH courses (row 547-563) to be parsed the same way.
I tested 2 CourseLeaf catalogs; this was second round testing.
South Texas College
URL: https://catalog.southtexascollege.edu/courses/ Link to output file: https://docs.google.com/spreadsheets/d/1p0EZ23zV-I8qPST0MBv9sU6SK7McXV2C/edit?usp=sharing&ouid=115685232190749733039&rtpof=true&sd=true Number of courses look good Data looks good ISSUE missing credit values (`3% of records), incorrect credit value type + a small number of course descriptions are truncated
Example: no credit values included in extract, but listed in catalog (https://catalog.southtexascollege.edu/courses/rbtc/)
Example: incorrect credit value type, nothing on page suggests semester (https://catalog.southtexascollege.edu/courses/math/)
These issues seem to be isolated to full sets of courses under a heading (RBTC, MATH, etc.)
There are also a few course descriptions that are truncated:
Here is what appears on the page for these courses:
Deleware Community College
URL: https://catalog.dccc.edu/courses/course-descriptions/ Link to output file: https://docs.google.com/spreadsheets/d/1mr3Aqjr3hw0p5ScvfV-rX90mnmSeQgoV/edit?usp=sharing&ouid=115685232190749733039&rtpof=true&sd=true Number of courses look good Data looks good ISSUE missing credit values (17% of records), incorrect credit value type + truncated course descriptions; I'm not providing screen shots, since it's exactly the same as the above but the output file has some examples highlighted
If there is a pattern to the missing credit values or truncated descriptions, I haven't found it.