Closed benjaminestes closed 5 years ago
A solution to this problem suggests a different representation of a robots.txt file. Currently a Robots object holds a bunch of state, and (*Robots) methods test for URL availability under that state. However, the interest in this data lies in its behavior under a certain application, i.e. given a user agent and path, can the agent crawl the path?
What if we choose a procedural representation for Robots? Presumably you still wind up with an internal object that can hold the same state Robots does now. But when you call parse, the return type is func(name, rawurl string) bool
, and the behavior is analogous to (*Robots) Test
now.
Actually the API would be simplified:
func Locate(rawurl string) string
func From(response int, in io.Reader) func(name, rawurl string) bool
Possibly with a named type for func(name, rawurl) bool
that allows for some extra documentation.
OK — how about adding a "default to allow" boolean field to Robots? The field gets set based on the status code. Then, in the absence of a matching group, the default value is used, instead of the current implicit default of "true". This only changes behavior if the response code is 5xx.
Closed by 053d199d1a
The documentation currently promises that the client only needs to check whether they are using the robots.txt which includes the interesting URL in its scope. However, the status code of a request for a robots.txt file also impacts the implied rules.
Having the client make sure to use the correct robots.txt file is the right choice. I believe this library should do the lifting. The choice is between adding another function, or amending From to take an additional status code argument.
In practice, a robots.txt file is always tied to a request for that file. Therefore, it seems reasonable to amend the From function to also take a status code. In the event (such as in testing) of a status code not being available, the desired behavior can be simulated by having the client produce their own.
I don't see an upside to having From and e.g. FromStatus.