geneontology / biolink-api

API for linked biological knowledge
http://api.geneontology.org/api/
6 stars 1 forks source link

Unblock queries from api.geneontology.org to golr-aux.geneontology.io #11

Open lpalbou opened 3 years ago

lpalbou commented 3 years ago

@kltm it seems golr-aux server started blocking requests from api.geneontology.org (3.209.185.147). I have double check by launching another instance of the GO API on another server with another IP and it works fine. This is affecting all API users and Ribbons - including the Alliance.

For testing purposes, I am letting the other server online (it's a spot instance to lower the cost, so it can be terminated by AWS); here is a query example: http://3.236.124.247/api/bioentity/function/GO%3A0044598?start=0&rows=100 The same won't run on http://api.geneontology.org/api/bioentity/function/GO%3A0044598?start=0&rows=100

zfinannee commented 3 years ago

The GO Ribbons are all down on the alliancegenome.org site, so this is extremely high priority for us. Any ETA on when it will be fixed would be very appreciated. Thanks so much!

cmungall commented 3 years ago

Apologies for the interruption in service, will be up again shortly

zfinannee commented 3 years ago

Thank you!! ;)

kltm commented 3 years ago

We're now warming up the fallback server to take over while we try and figure out what is going on.

kltm commented 3 years ago

Noting: golr-aux.geneontology.io should now be switched over to a "backup" server in AWS (54.156.227.50). The above URL now seems to be working.

We are still trying to figure out the underlying cause of getting cut off from the main server as no changes have been made to the system in some time.

zfinannee commented 3 years ago

Thank you, Seth. The GO Ribbons on alliancegenome.org pages I looked at are working now.

kltm commented 3 years ago

It's all very odd from this end. I've contacted the LBL IT/security to see if they might have done something--there seems to be no other expression of this issue except for the one IP address.

lpalbou commented 3 years ago

I can easily change the IP of the GO API since it's an EIP and you could change the DNS record. However chances are the same issue would occur, so best to fix at the source. If not GOlr-aux firewall policies, then agreed it must be at LBL IT. As a note, I believe the GO API is getting more traction so more usage, maybe that's what triggered some kind of rules, or LBL just added a new one without us knowing.

lpalbou commented 3 years ago

Additional note: https://github.com/geneontology/biolink-api/issues/7 would help mitigate those issues, but maybe a won't do as I believe the GO API may be due for a refactoring.

kltm commented 3 years ago

@lpalbou I've heard back from LBL security with a little more information:

I searched data from Jan  6 2021, 00:00:11 till date.  3.209.185.147 has made about 2,460,289 HTTP GET requests. 

Mar  8 17:09:27 is only one time it was flagged for SQL-Injection. 

I'd have easily called this false positive but you know the request has this:

-> "and 1=1&rows=1000" 

http://131.243.192.30/solr/select?q=*:*&qf=&fq=document_category:\"ontology_class\"&fl=annotation_class,annotation_class_label,description,source&wt=json&indent=on&fq=subset:goslim_agr and 1=1&rows=1000

1=1 is likely what triggered the sql injection attack! 

While, I've unblocked 3.209.185.147 - we still might want to check what/why we saw a 1=1  or we can call this one-off and lets see if it happens again! I'd leave it to your cycles. 

I think it would be useful to understand where the 1=1 comes from to confirm that this is fine with security and not an injection attempt (naturally not against an sql server, but it's good to understand what they're tuned for in case something like this comes up in the future). The filter was lifted a little while ago, but I'll wait until we have finished with LBL security to our satisfaction before switching back.

(As a side note, the volume stated volume ends up around 1.5k-2.k hits per hours from the API. Is this due to the Alliance site getting crawled maybe? Or perhaps the popularity that mentioned!)

kltm commented 3 years ago

Yeah, the more I think about this, the more it seems like there is some SQL injection attempt that successfully got passed through the API (https://security.stackexchange.com/questions/8761/sql-injection-with-and-1-1). While not directly dangerous for the moment, it might mean that inputs may not be getting sanitized? Or, as it's just a passthrough for Solr, maybe it's fine ignoring for now.

kltm commented 3 years ago

Looking at the logs on the fallback machine, I've got two more hits with 1=1, within a second of each other:

3.209.185.147 - - [09/Mar/2021:20:52:51 +0000] "GET /solr/select?q=*:*&qf=&fq=document_category:%22ontology_class%22&fl=annotation_class,annotation_class_label,description,source&wt=json&indent=on&fq=subset:goslim_agr%20AND%201=1&rows=1000 HTTP/1.1" 400 598 "-" "python-requests/2.22.0"

If nothing else, that would seem to decide that it was not a one-off, but somebody doing a little probing.

lpalbou commented 3 years ago

Couple of notes:

  1. I am turning off the test server that was here to illustrate the issue

  2. it's possible that someone is trying SQL injection but: (a) it seems odd that the person kept the same limiting query (fq, fl), unless it's a hacking bot; (b) as you mentioned, it's not SQL but Solr; (c) if i remember correctly, GOLr is a read only instance and (d) there is no private information on this db.. so the risk seems minimal. If the API had an auth system and could perform write operations, I would be more concerned, but that's not the case. Furthermore, whatever query the GO API would send to GOLr.. can be sent directly to GOLr too, so the vulnerability (if any) is not so much at the level of the API

  3. the API has an auto relaunch mechanism from a fresh AMI; it's easy to do the same with GOLr; if we do, the system will never fail and always auto relaunch a fresh db whatever attack someone wants to try. AWS also protects against basic ddos attacks

  4. unfortunately, I don't keep the logs (due to the autorelaunch from fresh AMI). I could flush them to an S3 on the USC account, but we discuss a year+ ago that sharing an EC2 role would allow the server to write logs directly in a shared and dedicated GO log bucket.

  5. The API has gain some traction, however about 600 requests / hour (out of your 1.5k - 2k / hour) are health checks from different geographic area to test if the API is working. That's also why we have had no down time of the API as if it fails 3 health checks (1mn30), it automatically relaunch from a fresh AMI

kltm commented 3 years ago

Spinning sanitizing into own issue (#12).