Right now we just assume everything is scrapable on the web, which breaks some rules. We're a small open source team so not a big deal yet. This ticket is create a new service that the BookmarkService.java can use to determine if a URL is scrapable on top of the checks from the user.
Requirements
[ ] needs #242
[ ] Create a new Java class WebCheckService.java (Or a better name).
[ ] Scaffold the following functions:
public boolean isScrapable(String url) { }
public String getRobotsTxt() { }
public List parseRobots { return List}
[ ] Add scaffolding for tests
I have tried to lay out the scaffold functions as train of thought how the service might work. Open to more ideas and artistic liberties BookmarkService calls isScrapable(url) when a user claims a Bookmark is scrapable in the request, i.e. we confirm this -> {
getRebotsTxt();
parseRobots();
// read through the list of RobotAgent.
// ensure that the path we are scraping is public.
return true or false.
}
Details
Right now we just assume everything is scrapable on the web, which breaks some rules. We're a small open source team so not a big deal yet. This ticket is create a new service that the BookmarkService.java can use to determine if a URL is scrapable on top of the checks from the user.
Requirements
I have tried to lay out the scaffold functions as train of thought how the service might work. Open to more ideas and artistic liberties BookmarkService calls isScrapable(url) when a user claims a Bookmark is scrapable in the request, i.e. we confirm this -> { getRebotsTxt(); parseRobots(); // read through the list of RobotAgent. // ensure that the path we are scraping is public. return true or false. }