asciid / stepik_scanner

Parser for stepik.org
2 stars 0 forks source link

Scanner avoid some links #2

Closed asciid closed 4 years ago

asciid commented 4 years ago

Very rarely during the scan process tool skips some links. Maybe the problem is in the way HTML is being parsed so I have to try beautifulsoup.

I will post here verbose exploration of an issue.

asciid commented 4 years ago

Stepik is made with Django and it doesn't respond clearly on some 404's and on all the 403's. It gives a wrapper page but server's status is still 200.

So I needed to get page's status.

I parsed for: <section class="course-promo__head">

In responce's text but I hadn't expected another type of cources's class. There are actually syllabus and promo.

I thought it's a bad idea to depend on such subtle way of parsing so now I take page's title.

With incorrect 404's it is: Stepik > 404

With 403's: Stepik

And normally: Course Name -- Stepik

And I have no need in special function to grep a status. Brilliant!


Issue is closed.