geneontology / go-fastapi

https://api.geneontology.org/
4 stars 3 forks source link

LWP requests not allowed - response code 403 - error code: 1010 #96

Closed kimrutherford closed 2 weeks ago

kimrutherford commented 3 weeks ago

Hi.

This is a bit odd. I have an easy work-around but I'm reporting it in case it confuses others.

The API returns 403 if the user-agent header contains libwww-perl

This curl command works:

curl -i -X 'GET' \
  'https://api.geneontology.org/api/go-cam/662af8fa00000408' \
  -H 'user-agent: evil' -H 'accept: application/json'

But this doesn't:

curl -i -X 'GET' \
  'https://api.geneontology.org/api/go-cam/662af8fa00000408' \
  -H 'user-agent: libwww-perl/6.72' -H 'accept: application/json'

Output:

HTTP/2 403 
date: Mon, 17 Jun 2024 06:34:31 GMT
content-type: text/plain; charset=UTF-8
content-length: 16
x-frame-options: SAMEORIGIN
referrer-policy: same-origin
cache-control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
expires: Thu, 01 Jan 1970 00:00:01 GMT
server: cloudflare
cf-ray: 895103ad8c6379d0-SYD

error code: 1010%   

For completeness, this is the script that confused me:

#!/usr/bin/env perl                                                                                                                               

use strict;
use warnings;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new();

my $request = HTTP::Request->new(GET => "https://api.geneontology.org/api/go-cam/662af8fa00000408");
$request->header( "accept" => "application/json" );
my $response = $ua->request($request);

print $response->status_line(), "\n", $response->content(), "\n";

Output:

403 Forbidden
error code: 1010

The work-around:

$request->header( "user-agent" => "evil" );
kltm commented 2 weeks ago

@kimrutherford Hm, good news and bad news. The good news is that this is a "feature" of Cloudflare that has a toggle switch (a la https://meta.stackexchange.com/questions/261741/cloudflare-error-1010-banned-access-based-on-your-browsers-signature). The bad news is that I'm not sure we should turn it off to support this use case--we've had trouble with bots in the past and apparently this has been a problematic user agent (from the POV of Cloudflare).

I think we have adjusted the filter to allow things generally through for api.geneontology.org, while still preserving the checks on the other geneontology.org sites. Could you give this a try and let us know if it's working for you now? (We may have to revisit if bot traffic increases, etc.)

Thank you to @sierra-moxon for checking into this.

kimrutherford commented 2 weeks ago

The bad news is that I'm not sure we should turn it off to support this use case--we've had trouble with bots in the past and apparently this has been a problematic user agent (from the POV of Cloudflare).

I think it's OK to leave it turned on as long as it's documented that the user-agent needs to be set to something else in client code. Although to be honest I think bots are going to be setting the user-agent to innocent sounding things too.

Could you give this a try and let us know if it's working for you now?

It works now thanks.

Thank you to @sierra-moxon for checking into this.

Thanks!