dmi3kno / polite

Be nice on the web
https://dmi3kno.github.io/polite/
Other
325 stars 14 forks source link

nod calls bow with wrong arguments #31

Closed wmay closed 2 years ago

wmay commented 4 years ago

nod calls bow with the wrong arguments when the URL subdomain changes, causing it to throw an error.

Here's a short reproducible example:

url1 = 'https://essd.copernicus.org/articles/search.html'
url2 = 'https://seeker.copernicus.org/search.php?abstract=atmospheric+chemistry&startYear=2008&endYear=2020&paperVersion=final&journal=386&page=1'
bow(url1) %>% nod(url2)
dmi3kno commented 4 years ago

Thank you for your question. nod() is intended to "modify the path" not define it anew. In this case you need to bow() to the new url:

library(polite)
url1 = 'https://seeker.copernicus.org/'
url2 = 'https://seeker.copernicus.org/search.php?abstract=atmospheric+chemistry&startYear=2008&endYear=2020&paperVersion=final&journal=386&page=1'

bow(url1) %>% nod(url2)
#><polite session> https://seeker.copernicus.org/search.php
#>    User-agent: polite R package
#>    robots.txt: 1 rules are defined for 1 bots
#>   Crawl delay: 5 sec
#>  The path is scrapable for this user-agent

Having said that, I agree that the error message you are getting is not informative and I should improve it. Will close the issue once the error message is improved.

wmay commented 4 years ago

Fair enough. I will adjust my code accordingly.