Vlad777 / mit-stanford

MIT-Stanford MOOC mashup
2 stars 0 forks source link

Lecture Link links outside of Stanford. #10

Open reztic opened 11 years ago

reztic commented 11 years ago

https://github.com/Vlad777/mit-stanford/blob/master/courseScraperSEE.php

Does not check if the lecture link is within Stanford or not. I think this maybe a problem later on when we need to do videos.

My code, https://github.com/Vlad777/mit-stanford/blob/master/STANpages.php does the check. Just do a:

preg_match('/^http:\/\/see.stanford.edu/',$courseLink)

Alicedelic commented 11 years ago

Your code was used initially (you can see it was included in courseScraperSEE.php, then commented out and there is a comment explaining why),

The reason is that that code would cause those pages to be entirely skipped, so their content would be partially filled: eg: course title exists but then all other values are null. This would cause a number of other errors or validation needed when fetching from database to fill the index.php.

We found it more convenient to still 'walk' those pages, then in the scrape functions check if some content cannot be found so it can be set to default values, eg. profname to: 'SEE Instructor', etc.

Ideally we would write scrape code to handle those pages as well (when possible), I feel that the iTunes pages can be scraped.