Andrew-liu / coursera_spider

A Simple spider that use to crawl the Coursera video and pdf links and downloader script
22 stars 20 forks source link

get Coursera video download link via program #1

Open redstoneleo opened 9 years ago

redstoneleo commented 9 years ago

I want extract Coursera video download link via program(mainly Python) behind those links

https://www.coursera.org/learn/human-computer-interaction/lecture/s4rFQ/the-interaction-design-specialization

https://www.coursera.org/learn/calculus1/lecture/IYGhT/why-is-calculus-going-to-be-so-much-fun

After red a lot of articles about this, still cannot find a way to extract the video download link via program, anyone can offer a step by step solution of extracting the video download link ? Thanks!

P.S. I know this project , https://github.com/coursera-dl/coursera but the code is so complex , so I dropped out.

Andrew-liu commented 9 years ago

I think you should read my blog for crawl for the coursera and you'd better to konw Xpath or regular expression, which can get from google

http://andrewliu.tk/2014/12/14/Python%E7%88%AC%E8%99%AB-%E4%B8%89-Coursera%E6%8A%93%E7%AB%99%E5%B0%8F%E7%BB%93/

Good Luck for you.

redstoneleo commented 9 years ago

多谢!大体看了一下你的文章。 感觉你那个抓取对我上面提到的两个链接不起作用吧?!因为页面源代码里 不存在 mp4和pdf字段

Andrew-liu commented 9 years ago

If you can't get mp4 or pdf section, You'd better to check if the video cause by the javascript, which you need apply other program.

If you can't get the download url, You should apply google inspect source or check the source code function.

Good luck.

redstoneleo commented 9 years ago

在这个项目的帮助下和chrome开发者工具分析后有眉目了 https://github.com/coursera-dl/coursera