asg017 / sqlite-http

A SQLite extension for making HTTP requests purely in SQL
MIT License
227 stars 11 forks source link

Robots.txt parser? #21

Closed asg017 closed 2 years ago

asg017 commented 2 years ago

Not sure if it makes sense in a HTTP library, maybe a separate extension/project...

https://en.wikipedia.org/wiki/Robots_exclusion_standard

https://github.com/google/robotstxt

https://pkg.go.dev/github.com/jimsmart/grobotstxt

select i, line, column, value from robotstxt_comments(readfile('robots.txt'));
select i, line, name from robotstxt_sitemaps(readfile('robots.txt'));
select
  user_agent, -- idk
  type, -- fix typos, lowercased
  key,   -- raw value, allow, disallow, etc.
  value
from robotstxt_each(readfile('robots.txt'));

select robotstxt_agent_allowed(
  readfile('robots.txt'), 
  'MyRobot', 
  'http://example.net/members/index.html'
);