ESIPFed / sweet

Official repository for Semantic Web for Earth and Environmental Terminology (SWEET) Ontologies
Other
117 stars 33 forks source link

error 403 downloading SWEET using Protege or BioPortal #150

Closed graybeal closed 2 years ago

graybeal commented 5 years ago

Neither Protege nor BioPortal could open SWEET version 3.3.0 (http://sweetontology.net/sweetAll), getting 403 errors when attempting to follow the redirects.

Initial troubleshooting suggested OWLAPI was a principle component of concern, since other tools could open SWEET OK.

graybeal commented 5 years ago

Initial troubleshooting log over Slack:

lewismc 1:55 PM Hi. I wrote to the esip-cor mailing list with my most recent query The HTTPD server we have running on COR seems to be returning a 403 to clients. This was demonstrated within the Protege issue we discussed on the SemTech mailing list It is not happening when I'm attempting to use sweet IRI's within the sweet-ontology-mapper I'm investigating COR and going to look into the HTTPD logs carueda 1:58 PM I did a few tests using HTTPie and didn't see anhy returing 403. If you can provide a specific URL I can retry with that I hope to have a bit of more time to look into COR issues related with this next week. (edited) graybeal 2:02 PM (unrelated to previous thread) I'm not sure who sees the tickets for MMI, so I'll just reference one here for people to be aware of: https://github.com/mmisw/orr-ont/issues/72 (edited) lewismc 2:03 PM Yes I've also tried using command line clients and all is good It's when we try to access it through in particular owlapi The server log for a GET using owlapi looks as follows 131.191.26.8 - - [17/Jul/2019:21:02:13 +0000] "GET /human HTTP/1.1" 403 1182 "-" "Java/1.8.0_191" Thats of course a GET on http://sweetontology.net/human Returning 403 When I pasted the above link, the slackbot which monitored this channel was even successfully served with a 200 as follows 108.162.246.182 - - [17/Jul/2019:21:04:10 +0000] "GET /ont/api/v0/ont?iri=http://sweetontology.net/human HTTP/1.1" 200 905 "-" "Slackbot-LinkExpanding 1.0 (+https://api.slack.com/robots)" 54.172.45.74 - - [17/Jul/2019:21:04:10 +0000] "GET /human HTTP/1.1" 200 905 "-" "Slackbot-LinkExpanding 1.0 (+https://api.slack.com/robots)" 54.172.45.74 - - [17/Jul/2019:21:04:10 +0000] "GET /favicon.ico HTTP/1.1" 400 226 "-" "Slackbot 1.0 (+https://api.slack.com/robots)" I've caught the following in the error log [Wed Jul 17 20:57:28.407680 2019] [mpm_prefork:notice] [pid 30147] AH00169: caught SIGTERM, shutting down [Wed Jul 17 20:57:28.552428 2019] [suexec:notice] [pid 18385] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec) [Wed Jul 17 20:57:28.567029 2019] [http2:warn] [pid 18386] AH10034: The mpm module (prefork.c) is not supported by mod_http2. The mpm determines how things are processed in your server. HTTP/2 has more demands in this regard and the currently selected mpm will just not do. This is an advisory warning. Your server will continue to work, but the HTTP/2 protocol will be inactive. [Wed Jul 17 20:57:28.567044 2019] [http2:warn] [pid 18386] AH02951: mod_ssl does not seem to be enabled [Wed Jul 17 20:57:28.567526 2019] [lbmethod_heartbeat:notice] [pid 18386] AH02282: No slotmem from mod_heartmonitor [Wed Jul 17 20:57:28.569751 2019] [mpm_prefork:notice] [pid 18386] AH00163: Apache/2.4.39 (Amazon) configured -- resuming normal operations [Wed Jul 17 20:57:28.569765 2019] [core:notice] [pid 18386] AH00094: Command line: '/usr/sbin/httpd' carueda 2:15 PM hmm, I see ... strange, so perhaps owlapi is setting some headers (in particular Authorization) along with the request thus making the cor backend think it needs to authorize the request hence failing with 403 if the auth header is invalid.... just a quick hypothesis for now. (edited) lewismc 2:16 PM Yes I wonder if we are also interpreting things correctly within HTTPD configuration For example, which config is being read currently? ./conf.d/sweetontology.conf ./conf.d/sweetontology.conf__attempting_redirect_based_on_html_accept_header ./conf/httpd.conf carueda 2:50 PM only the ones ending with .conf carueda 3:00 PM httpd.conf is the more general config and sweetontology.conf is for the sweetontology domain(s) lewismc 3:24 PM Yes I'm trying to fix the AML dependency which is used in sweet-alignment-manager. Once I've done that I'll write back here.

graybeal commented 5 years ago

We of BIoPortal are reporting to OWLAPI developer.

cmungall commented 5 years ago

Do you have a link to the issue? See previous comments on this issue. You have gotten this working with rdf tools but that doesn't demonstrate there is no issue with the owl.

On Tue, Jul 23, 2019, 02:41 John Graybeal notifications@github.com wrote:

We of BIoPortal are reporting to OWLAPI developer.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/sweet/issues/150?email_source=notifications&email_token=AAAMMOMXJE6FNNTSHQS4A6TQAZHTDA5CNFSM4IF6RP42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2RSHMQ#issuecomment-514007986, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMOK37RKNZNWG6JUB56TQAZHTDANCNFSM4IF6RP4Q .

cmungall commented 5 years ago

This is kind of odd, is the sweet server configured in an odd way?

The Cookie line here is suspicious:


$ wget http://sweetontology.net/human

--2019-07-23 08:19:20--  http://sweetontology.net/human

Resolving sweetontology.net... 34.216.150.176

Connecting to sweetontology.net|34.216.150.176|:80... connected.

HTTP request sent, awaiting response... 200 OK

Cookie coming from sweetontology.net attempted to set domain to esipfed.org

Length: unspecified [text/turtle]

Saving to: `human'

    [ <=>

                ] 5,354       --.-K/s   in 0s

2019-07-23 08:19:20 (56.7 MB/s) - `human' saved [5354]

On Tue, Jul 23, 2019 at 7:31 AM Chris Mungall cjmungall@lbl.gov wrote:

Do you have a link to the issue? See previous comments on this issue. You have gotten this working with rdf tools but that doesn't demonstrate there is no issue with the owl.

On Tue, Jul 23, 2019, 02:41 John Graybeal notifications@github.com wrote:

We of BIoPortal are reporting to OWLAPI developer.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/sweet/issues/150?email_source=notifications&email_token=AAAMMOMXJE6FNNTSHQS4A6TQAZHTDA5CNFSM4IF6RP42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2RSHMQ#issuecomment-514007986, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMOK37RKNZNWG6JUB56TQAZHTDANCNFSM4IF6RP4Q .

carueda commented 5 years ago

@cmungall Good observation about the domain set by cookie. Just for reference, the full set of headers as shown by httpie:

$ http -h get http://sweetontology.net/human
HTTP/1.1 200
Access-Control-Allow-Origin: *
CF-RAY: 4faec40d09b1bb22-SEA
Connection: Keep-Alive
Content-Encoding: gzip
Content-Type: text/turtle;charset=UTF-8
Date: Tue, 23 Jul 2019 15:42:42 GMT
Keep-Alive: timeout=5, max=100
Server: cloudflare
Set-Cookie: __cfduid=de565a112b9a3e60fe6c13b183b64d6e61563896562; expires=Wed, 22-Jul-20 15:42:42 GMT; path=/; domain=.esipfed.org; HttpOnly
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: sameorigin
X-XSS-Protection: 1; mode=block

I'll see how to adjust the apache config so it sets no cookie at all or it sets .sweetontology.net as the domain.

carueda commented 5 years ago

@cmungall I just adjusted the apache setting to remove setting any cookies. The same request/response (only headers) look like this now (no Set-Cookie):

$ http -h get http://sweetontology.net/human
HTTP/1.1 200
Access-Control-Allow-Origin: *
CF-RAY: 4faecfa3fd6a29e1-SEA
Connection: Keep-Alive
Content-Encoding: gzip
Content-Type: text/turtle;charset=UTF-8
Date: Tue, 23 Jul 2019 15:50:37 GMT
Keep-Alive: timeout=5, max=100
Server: cloudflare
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: sameorigin
X-XSS-Protection: 1; mode=block
carueda commented 5 years ago

I'm looking at the 403 error reported by Protégé (5.2.0).

(Note: I previously mentioned "proxy-pass" as the basic mechanism for sweetontology.net/* resolution but actually this is based on RewriteRule see this wiki.)

Looking at /var/log/httpd/access_log on the server while the load is attempted in Protégé there's certainly a 403 reported from the server:

134.89.112.195 - - [23/Jul/2019:16:09:01 +0000] "GET /sweetAll HTTP/1.1" 403 1180 "-" "Java/1.8.0_40"

(It would be useful to see all the headers set by Protégé in this request, but the apache logging on the server is not set up for that, afaict.)

However, while loading via httpie (eg., http get http://sweetontology.net/sweetAll Accept:text/turtle):

162.158.106.51 - - [23/Jul/2019:16:10:10 +0000] "GET /ont/api/v0/ont?iri=http://sweetontology.net/sweetAll HTTP/1.1" 200 2489 "-" "HTTPie/0.9.9"
134.89.112.195 - - [23/Jul/2019:16:10:10 +0000] "GET /sweetAll HTTP/1.1" 200 2489 "-" "HTTPie/0.9.9"

Somehow the request from Protégé is not triggering the RewriteRule on the server!

carueda commented 5 years ago

TD;LR Just diagnosed the problem: Cloudflare(*) is the piece that is complaining with 403 when the request is done with header User-Agent:Java/1.8.0_40 (set by Protégé).

So, some quick suggestions:

(*) Just learned the such a system was in place for the COR server.


Details

To see the request headers, which is key to continue this investigation, I just enabled mod_log_forensic on the server.

This is what's logged out in /var/log/httpd/forensic_log when making the request from Protége:

+17448:5d374f89:1|GET /sweetAll HTTP/1.1|Accept:application/rdf+xml, application/xml; q=0.5, text/xml; q=0.3, */*; q=0.2|Accept-Encoding:xz,gzip,deflate|User-Agent:Java/1.8.0_40|Host:sweetontology.net|Connection:keep-alive
-17448:5d374f89:1

The only "interesting" header is Agent:Java/1.8.0_40, so, indeed, it triggers the 403!:

$ http http://sweetontology.net/sweetAll 'User-Agent:Java/1.8.0_40'
HTTP/1.1 403 Forbidden
CF-RAY: 4fafbb209eec2a6b-SEA
Cache-Control: max-age=10
Connection: Keep-Alive
Content-Encoding: gzip
Content-Type: text/html; charset=UTF-8
Date: Tue, 23 Jul 2019 18:31:22 GMT
Expires: Tue, 23 Jul 2019 18:31:32 GMT
Keep-Alive: timeout=5, max=100
Server: cloudflare
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Content-Type-Options: nosniff
X-Frame-Options: sameorigin
X-XSS-Protection: 1; mode=block

<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>Access denied | cor.esipfed.org used Cloudflare to restrict access</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->
<style type="text/css">body{margin:0;padding:0}</style>

<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script><!--<![endif]-->
<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script><!--<![endif]-->

</head>
<body>
  <div id="cf-wrapper">
    <div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>
    <div id="cf-error-details" class="cf-error-details-wrapper">
      <div class="cf-wrapper cf-header cf-error-overview">
        <h1>
          <span class="cf-error-type" data-translate="error">Error</span>
          <span class="cf-error-code">1010</span>
          <small class="heading-ray-id">Ray ID: 4fafbb209eec2a6b &bull; 2019-07-23 18:31:22 UTC</small>
        </h1>
        <h2 class="cf-subheadline">Access denied</h2>
      </div><!-- /.header -->

      <section></section><!-- spacer -->

      <div class="cf-section cf-wrapper">
        <div class="cf-columns two">
          <div class="cf-column">
            <h2 data-translate="what_happened">What happened?</h2>
            <p>The owner of this website (cor.esipfed.org) has banned your access based on your browser's signature (4fafbb209eec2a6b-ua21).</p>
          </div>

        </div>
      </div><!-- /.section -->

      <div class="cf-error-footer cf-wrapper">
  <p>
    <span class="cf-footer-item">Cloudflare Ray ID: <strong>4fafbb209eec2a6b</strong></span>
    <span class="cf-footer-separator">&bull;</span>
    <span class="cf-footer-item"><span>Your IP</span>: 34.216.150.176</span>
    <span class="cf-footer-separator">&bull;</span>
    <span class="cf-footer-item"><span>Performance &amp; security by</span> <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="brand_link" target="_blank">Cloudflare</a></span>

  </p>
</div><!-- /.error-footer -->

    </div><!-- /#cf-error-details -->
  </div><!-- /#cf-wrapper -->

  <script type="text/javascript">
  window._cf_translation = {};

</script>

</body>
</html>
graybeal commented 5 years ago

I will pursue this with Matthew (Protege) and Jennifer (BioPortal). We can perhaps test it easily enough on the BioPortal side, will check.

jvendetti commented 5 years ago

In BioPortal we use OWL API code that looks roughly like the following to load ontologies:

File file = new File("sweetAll.ttl");
FileDocumentSource fileDocumentSource = new FileDocumentSource(file);
OWLOntologyManager manager = OWLManager.createOWLOntologyManager();
OWLOntology ontology = manager.loadOntologyFromOntologyDocument(fileDocumentSource);

I could be wrong, but I don't believe the OWL API has any public methods that would allow us to specify headers for the loading of imports. I'll investigate this a little more when I have time.

Per the suggestion from @carueda about a more recent version of Java - I executed the above snippet of code in my local development environment against Java 8 (what we currently use in BioPortal) and Java 12.

Running against Java 8, I get the 403 responses:

org.semanticweb.owlapi.model.UnloadableImportException: Could not load imported ontology: <http://sweetontology.net/human> Cause: Server returned HTTP response code: 403 for URL: http://sweetontology.net/human

... however, running against Java 12 eliminates the occurrence of the 403s.

graybeal commented 5 years ago

Aha! Very telling. So, likely a default setting of Java 8, in how it tries to open content off the web. There must be a default configuration in Java that can be modified to change the header settings.

cmungall commented 5 years ago

I guess you could have the java client lie about its version, but it seems the problem is on the server configuration side. Even if you have a bioportal-specific hack, you want this to work for everyone.

Why does cloudflare decide to reject this? Seems totally arbitrary. IMO server behavior should be more transparent and predictable.

Someone else has complained about this but they got no response: https://community.cloudflare.com/t/cloudflare-blocks-java-http-client/73621

thanks for the excellent sleuthing Carlos!

On Wed, Jul 24, 2019 at 7:01 AM John Graybeal notifications@github.com wrote:

Aha! Very telling. So, likely a default setting of Java 8, in how it tries to open content off the web. There must be a default configuration in Java that can be modified to change the header settings.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/sweet/issues/150?email_source=notifications&email_token=AAAMMOOKRBAHBHPQQOVNP7LQA7O2TA5CNFSM4IF6RP42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2VFRIA#issuecomment-514480288, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMONUK4WV333GGVAJXLLQA7O2TANCNFSM4IF6RP4Q .

cmungall commented 5 years ago

You may also be interested in the obo purl system, it could be easily adapted for sweet:

https://github.com/OBOFoundry/purl.obolibrary.org/

It can run on a tiny amazon server costing virtually nothing, and you could just have it redirect to raw github URLs (or S3 or anything else)

On Wed, Jul 24, 2019 at 9:28 AM Chris Mungall cjmungall@lbl.gov wrote:

I guess you could have the java client lie about its version, but it seems the problem is on the server configuration side. Even if you have a bioportal-specific hack, you want this to work for everyone.

Why does cloudflare decide to reject this? Seems totally arbitrary. IMO server behavior should be more transparent and predictable.

Someone else has complained about this but they got no response: https://community.cloudflare.com/t/cloudflare-blocks-java-http-client/73621

thanks for the excellent sleuthing Carlos!

On Wed, Jul 24, 2019 at 7:01 AM John Graybeal notifications@github.com wrote:

Aha! Very telling. So, likely a default setting of Java 8, in how it tries to open content off the web. There must be a default configuration in Java that can be modified to change the header settings.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/sweet/issues/150?email_source=notifications&email_token=AAAMMOOKRBAHBHPQQOVNP7LQA7O2TA5CNFSM4IF6RP42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2VFRIA#issuecomment-514480288, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMONUK4WV333GGVAJXLLQA7O2TANCNFSM4IF6RP4Q .

carueda commented 5 years ago

You may also be interested in the obo purl system, it could be easily adapted for sweet

Interesting suggestion @cmungall (do you want to capture it in a separate issue?) I'm not sure about its concrete capabilities but just wondering if it can handle different ontology representations automatically, or would they have to be pre-generated and then resolved via apache/nginx or similar rules...

Why does cloudflare decide to reject this? Seems totally arbitrary. IMO

Agree, totally arbitrary.

carueda commented 5 years ago

@graybeal @cmungall @lewismc So, shall we close this as it's not a COR issue?

graybeal commented 5 years ago

I entered a ticket in the Protege repository. (Somehow I don't think I'll get far with a ticket in the cloudflare repository, but maybe @abburgess has some sway.)

BioPortal has identified a fix to this issue (upgrading to Java 11, see https://github.com/ncbo/bioportal-project/issues/127), but has to finish upgrading SOLR and various other bits before the fix will be in production.

I prefer keeping tickets open until the problem is resolved. Even if it isn't the fault of the COR, the user sees it in COR. I'll defer to the judgment of the community here….

cmungall commented 5 years ago

This is a server issue. Not a protege one, not an owlapi one. the server should not block these calls.

graybeal commented 5 years ago

I agree, but from an end user perspective, the fact the cloudflare server is a fail is of no relevance, as it may never get fixed. The issue remains whether clients that use OWL API should try to find a workaround.

cmungall commented 5 years ago

I don't understand. It is of massive user relevance as many users will be using the owlapi.

Are you compelled to use cloudflare?

graybeal commented 5 years ago

The entire hosting organization (ESIP) has just changed to CloudFlare, it was a significant transition. So I'll presume to say "yes".

I should say there is still an 'open topic' with CloudFlare about the issue, but my impression is that they are not showing a lot of initiative to address it. Like you I would expect the impact to go far, even well beyond OWL API.

esip-lab commented 5 years ago

Re: cloudflare. We're in a bit of a hold, as the person who manages this for us has been out sick. I reached out earlier today to check-in and will give you an update soon.

cmungall commented 5 years ago

sweetontology.org is a separate domain, so shouldn't be hard to make an exception? E.g. you could but the ontology files in an S3 bucket and have the url redirect to this.

On Tue, Sep 17, 2019 at 1:03 PM John Graybeal notifications@github.com wrote:

The entire hosting organization (ESIP) has just changed to CloudFlare, it was a significant transition. So I'll presume to say "yes".

I should say there is still an 'open topic' with CloudFlare about the issue, but my impression is that they are not showing a lot of initiative to address it. Like you I would expect the impact to go well beyond OWL API.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/sweet/issues/150?email_source=notifications&email_token=AAAMMOKKXQTE727JIKMZVBTQKEZZXA5CNFSM4IF6RP42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD65XPSA#issuecomment-532379592, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMOIH4ZFFJELUCU6LEYDQKEZZXANCNFSM4IF6RP4Q .

lewismc commented 5 years ago

you could but the ontology files in an S3 bucket and have the url redirect to this.

We could yes. This would however break our watchdog script so SWEET source and the linked data representation would become out of sync. I don't have the cycles to hack together code to address this.

The Cloudflare move has screwed us and I would rather invest any time I have fixing that issue tbh.

graybeal commented 5 years ago

From earlier in this thread, @cmungall said:

I don't understand. It is of massive user relevance as many users will be using the owlapi.

I mis-spoke, the problem is of significant relevance. I was trying to say that users won't care whose issue it is—for them it's a user-facing issue. So we have to consider fixing it on our end whether or not cloudflare is the problem.

I think at this point we're waiting for @abburgess's update from her conversation with cloudflare. But I consider you to be the biggest driver of action on this ticket, as it affects "your" ontology. You are one of the most active developers and probably the most affected users.

esip-lab commented 5 years ago

Can you jump in on this David?

Annie Burgess, PhD

Lab Director | Earth Science Information Partners (ESIP)

esipfed.org/lab http://esipfed.org/lab | 585.738.7549

Sign up for the monthly ESIP Lab update here http://eepurl.com/dtKL8z.

On Mon, Sep 30, 2019 at 1:43 PM John Graybeal notifications@github.com wrote:

From earlier in this thread, @cmungall https://github.com/cmungall said:

I don't understand. It is of massive user relevance as many users will be using the owlapi.

I mis-spoke, the problem is of significant relevance. I was trying to say that users won't care whose issue it is—for them it's a user-facing issue. So we have to consider fixing it on our end whether or not cloudflare is the problem.

I think at this point we're waiting for @abburgess https://github.com/abburgess's update from her conversation with cloudflare. But I consider you to be the biggest driver of action on this ticket, as it affects "your" ontology. You are one of the most active developers and probably the most affected users.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/sweet/issues/150?email_source=notifications&email_token=ABX5PQDN2ITKPDEPTIPUKEDQMJQGHA5CNFSM4IF6RP42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD77A4RA#issuecomment-536743492, or mute the thread https://github.com/notifications/unsubscribe-auth/ABX5PQFHPF7QMXSTSEOP3TLQMJQGHANCNFSM4IF6RP4Q .

dbassendine commented 5 years ago

Hi everyone,

Apologies for the delay here, I am just getting back up to speed after a long recovery period.

I haven't looked through the issue in detail, but if it is caused by cloudflare filtering, a first quick fix to try would be to disable the cloudflare proxying for the cor.esipfed.org subdomain. So cloudflare would provide the DNS registration only, and requests to cor.esipfed.org should not run through their servers.

I have now done this, and it should take effect immediately - could you retest and see if this resolves the issue?

Thanks, David

carueda commented 5 years ago

Thanks @dbassendine, using Protégé (5.2.0), http://sweetontology.net/sweetAll now loads fine:

image

cmungall commented 5 years ago

Awesome, thanks so much!

lewismc commented 5 years ago

This also fixed the sweet-alignment-manager

dbassendine commented 5 years ago

Great! Sorry for the disruption caused by the transition over to Cloudflare.

cmungall commented 4 years ago

The problem has resurfaced

image

also with robot:

$ robot  merge -I http://sweetontology.net/sweetAll -o download/sweet.ttl
org.semanticweb.owlapi.io.OWLOntologyCreationIOException: Server returned HTTP response code: 403 for URL: http://sweetontology.net/sweetAll
Use the -vvv option to show the stack trace.
Use the --help option to see usage information.

I can confirm this is the same weird discrimination against Java/1.8.0_40

$ curl -vvv -H "User-Agent:Java/1.8.0_40"   http://sweetontology.net/sweetAll 
*   Trying 104.27.158.188...
* TCP_NODELAY set
* Connected to sweetontology.net (104.27.158.188) port 80 (#0)
> GET /sweetAll HTTP/1.1
> Host: sweetontology.net
> Accept: */*
> User-Agent:Java/1.8.0_40
> 
< HTTP/1.1 403 Forbidden
< Date: Mon, 05 Oct 2020 02:53:02 GMT
< Content-Type: text/plain; charset=UTF-8
< Content-Length: 16
< Connection: keep-alive
< Set-Cookie: __cfduid=d96784c3fc857ee41e8f517eee57ebff71601866382; expires=Wed, 04-Nov-20 02:53:02 GMT; path=/; domain=.sweetontology.net; HttpOnly; SameSite=Lax
< X-Frame-Options: SAMEORIGIN
< Cache-Control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Expires: Thu, 01 Jan 1970 00:00:01 GMT
< cf-request-id: 05984655c3000028273b9f8200000001
< Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report?lkg-colo=4&lkg-time=1601866383"}],"group":"cf-nel","max_age":604800}
< NEL: {"report_to":"cf-nel","max_age":604800}
< Server: cloudflare
< CF-RAY: 5dd3d99c6f4f2827-SJC
< 
* Connection #0 to host sweetontology.net left intact
carueda commented 4 years ago

I understand the hosting for sweetontology was changed about a month ago, maybe @lewismc can elaborate.

lewismc commented 3 years ago

@cmungall I can reproduce. This is annoying! @carueda Yes this is a variable which changed recently. I'm going to debug with Annie Burgess.

lewismc commented 3 years ago

I did a bit of experimenting. User-Agent header values above Java/1.9 all work. Java/1.9 and below do not... The reasoning for this is that Cloudflare strips any header that contains dots (.) from origin web server responses. For example, the origin web server header test.header:data is removed by Cloudflare’s proxy.. This however still doesn't explain why one request is working and another request is not. I think we should open a customer helpdesk ticket wirh Cloudflare to figure this out. I'm going to work with Annie to do that.

lewismc commented 3 years ago

@dbassendine can you please repeat the following steps

I haven't looked through the issue in detail, but if it is caused by cloudflare filtering, a first quick fix to try would be to disable the cloudflare proxying for the cor.esipfed.org subdomain. So cloudflare would provide the DNS registration only, and requests to cor.esipfed.org should not run through their servers.

Thank you

angelozerr commented 3 years ago

Just for your information I encountered the same problem in an another project. It seems it's a user agent problem. See my experimentation result at https://github.com/redhat-developer/vscode-xml/issues/429#issuecomment-784875083

rrovetto commented 3 years ago

Per the recent Slack thread (@graybeal), attached is a WebVOWL error from the Github ESIP SWEET link WebVOWL_ErrorMessage

When testing it manually, inserting the URL on the WebVOWL site, the message reads "Error: Received empty graph" WebVOWL_ErrorMessage_2

@dbassendine - After your first fix, do you see how it can be fixed again?

brandonnodnarb commented 3 years ago

Confirmed this is still an issue.
@dbassendine per your previous response (on 4th Oct 2019);

I haven't looked through the issue in detail, but if it is caused by cloudflare filtering, a first quick fix to try would be to disable the cloudflare proxying for the cor.esipfed.org subdomain. So cloudflare would provide the DNS registration only, and requests to cor.esipfed.org should not run through their servers.

Could you please either verify the cloudflare proxying is disabled or verify that you (or someone) has disabled it again. Thank you.

esip-lab commented 2 years ago

Hi all - proxying is already disabled. Screen Shot 2021-10-19 at 10 23 14 AM

carueda commented 2 years ago

@esip-lab Annie: could you recheck on this given that access from protege remains unsuccessful and the fact that the amazon instance is now different?

+cc: @rrovetto , @graybeal , @brandonnodnarb .

esip-lab commented 2 years ago

I'm not sure what you want me to check here.

graybeal commented 2 years ago

Annie, we're convinced that although we've checked with CloudFlare before and they denied it, they are causing the issue with resolution (because Protege w/Java 8 doesn't resolve the sub-ontologies correctly while BioPortal w/Java 11 does, for one example—that exactly matches the previous problem that CloudFlare fixed for a year). Brandon's comment cites an easy test to confirm whether CloudFlare is the blocker.

esip-lab commented 2 years ago

Okay, I have turned OFF the proxy through Cloudflare based on Brandon's comment.

graybeal commented 2 years ago

Thanks! To my surprise, Protege still has a problem opening https://sweetontology.net/sweetAll. Oh, but opening http://sweetontology.net/sweetAll now works, pulling in all ontologies.

And, opening https://raw.githubusercontent.com/ESIPFed/sweet/master/src/sweetAll.ttl in Protege successfully pulls in all the other ontologies. (I'm not absolutely sure whether we tried this before, so we may have to try turning the proxy back on to see if it still works.)

This is the error message for https:

jgraybeal@BMIR-TPM ~ % curl -vvv -H "User-Agent:Java/1.8.0_40" https://sweetontology.net/sweetAll
*   Trying 34.208.49.64...
* TCP_NODELAY set
* Connection failed
* connect to 34.208.49.64 port 443 failed: Operation timed out

So before we turn proxying back on and retest, does anyone want to explore or opine about the SSL not resolving? I think we should be handling https requests also.

carueda commented 2 years ago

AFAICT, HTTPS access has never been set up https://github.com/ESIPFed/cor/issues/37

(note that sweetontology.net and cor.esipfed.org share the same host server.)

rrovetto commented 2 years ago

Following John, I tested the raw git link in the past and i don't recall it opening. Now testing both links again on my end...

I recommend we also list links to open all stored versions of sweet. The releases page has zip files, but it will be more beneficial to the public, as well as user-friendly.

carueda commented 2 years ago

Just don't try with https as it hasn't been set up (see my previous comment).

But http://sweetontology.net/ resolves just fine from my side, that is:

make sense?

brandonnodnarb commented 2 years ago

Thanks for all your efforts on this issue @carueda, @graybeal, @rrovetto, and @esip-lab. I have tested http://sweetontology.net/sweetAll using the Open from URI... functionality in Protégé 5.5.0 and it appears to resolve properly AND load all imported ontologies. Using http://sweetontology.net/sweetAll.ttl works in the same manner.

I can also confirm that a browser renders http://sweetontology.net/ as https://github.com/ESIPFed/sweet/blob/master/README.md which matches the current set up/configuration.

As such, I will close this issue.

Please lodge any other issues mentioned in the thread separately.