When there is a page redirect, colly automatically follows the redirect. In that case, I get a Request object in the OnHTML callback. It seems that colly provides the original Request and not the one after the redirect. Since I want to follow all the links on the html site, I use the Request object to get the absolute URL. However, in that case this doesn't work as expected, since the Request Object has the wrong URL. The example below illustrates the problem:
When there is a page redirect, colly automatically follows the redirect. In that case, I get a Request object in the OnHTML callback. It seems that colly provides the original Request and not the one after the redirect. Since I want to follow all the links on the html site, I use the Request object to get the absolute URL. However, in that case this doesn't work as expected, since the Request Object has the wrong URL. The example below illustrates the problem:
The example gives "http://127.0.0.1:9999/test". However when I go to "http://127.0.0.1" via firefox and click on the link, I get redirected to "http://127.0.0.1:9999/r/test".
Is there a better way to mimic the behavior of the browser in this case?