berkeley-dsep-infra / datahub

JupyterHubs for use by Berkeley enrolled students
https://docs.datahub.berkeley.edu
BSD 3-Clause "New" or "Revised" License
63 stars 38 forks source link

Can't access a specific API from Datahub #3256

Closed ajlyons closed 2 years ago

ajlyons commented 2 years ago

Bug description

I'm trying to retrieve data from the Cal-Adapt API, but can't even connect to it. The following curl command times out in terminal (as does the equivalent R command).

curl https://api.cal-adapt.org/api/

Other APIs seem to work ok (e.g., curl https://reqbin.com/echo). Any insights why this might be happening? Maybe a port or firewall issue?

Environment & setup

How to reproduce

Run the following in terminal from a VM on Datahub (times out). Run the same command on your local machine (works).

curl https://api.cal-adapt.org/api/

ryanlovett commented 2 years ago

@balajialg I've no idea about this one. That endpoint is also blocked from the datahub file server, and from directly on a node, outside of the single user environment. We don't seem to have any outbound network firewall rules that would block this. I also tried from a different Google Cloud project without success.

It looks like that service is running on Azure. I wonder if api.cal-adapt.org is blocking traffic from Google Cloud? The same applies to cal-adapt.org.

balajialg commented 2 years ago

@ryanlovett Thanks a lot for looking into this! Looks like an interesting issue to investigate! Based on your question, I tried sending curl requests to some of the RTL services like BOA and Course Capture (which I assumed were AWS specific but I might be wrong) and received some responses. Definitely, worth exploring further.

@aljyons Is this a blocker for you? We can investigate more regarding this issue but want to understand the importance/urgency

ajlyons commented 2 years ago

Hi Ryan and Balaji,

Many thanks for looking into the issue https://github.com/berkeley-dsep-infra/datahub/issues/3256#issuecomment-1036940329 of the Cal-Adapt API being blocked on Datahub. Great detective work.

This is not urgent, but I'm looping in Brian Galey who administers Cal-Adapt for the Berkeley GIF http://gif.berkeley.edu/. It would be good to move toward a solution because it's a great API and more and more people are using it now that there's a R package https://ucanr-igis.github.io/caladaptr/ and Python library https://ucanr-igis.github.io/caladapt-py/ that make it easier to access. Many thanks for your help.

Best, Andy

On 2/11/2022 6:12 PM, Balaji Alwar wrote:

@ryanlovett https://github.com/ryanlovett Thanks a lot for looking into this! Looks like an interesting issue to investigate! Based on your question, I tried sending curl requests to some of the RTL services like BOA and Course Capture (which I assumed were AWS specific but I might be wrong) and received some responses. Definitely, a good question to explore further.

@aljyons Is this a blocker for you? We can investigate more regarding this issue but want to understand the importance/urgency of this request

@balajialg https://github.com/balajialg I've no idea about this one. That endpoint is also blocked from the datahub file server, and from directly on a node, outside of the single user environment. We don't seem to have any outbound network firewall rules that would block this. I also tried from a different Google Cloud project without success.

It looks like that service is running on Azure. I wonder if api.cal-adapt.org is blocking traffic from Google Cloud? The same applies to cal-adapt.org.

Message ID: @.***>


Andy Lyons, Program Coordinator Informatics and GIS Statewide Program (IGIS) UC Division of Agriculture & Natural Resources http://igis.ucanr.edu/ @.***/

bkg commented 2 years ago

Thanks for looking into this and sorry about the connection issues. Yes, we needed to block a number of hosts originating from Google Cloud that were hammering our server and impacting response times for all API clients.

Is there a stable range of IPs for Datahub on Google Cloud that we could allow through or are they randomly assigned from the larger pool?

balajialg commented 2 years ago

Thanks for the information, @bkg!

@felder @ryanlovett Any thoughts related to the Datahub IP?

balajialg commented 2 years ago

@yuvipanda Do you have any suggestions on the way forward?

yuvipanda commented 2 years ago

@bkg ah, we autoscale heavily and hence there is no static pool of public IP ranges we can provide :(

ryanlovett commented 2 years ago

@bkg Is there a way that people can be permitted to connect by using a token, overriding an IP block?

bkg commented 2 years ago

@yuvipanda thanks for the reply. Understood, I figured that was probably the case.

@ryanlovett unfortunately not, token based auth isn't currently available but would be a helpful addition here.

balajialg commented 2 years ago

@ajlyons Unfortunately, not able to accommodate this request at this juncture. Requires more detailed discussions at our end internally if we need to figure out the way forward.

@yuvipanda: @felder and I are curious, why is this not feasible at this juncture?