Open jwfuller opened 3 years ago
@CUSeif and I tried this using the windows client and got a TCP/IP error 'Could not resolve Hostname'. I dropped the IP rules from the load balancer and we got the same error. Further debugging is needed. The web client might help macOS users test this (but it is not a secure connection which troubles me).
I think this has been resolved with this ticket, no?
Not completely. culibraries/folio-ansible#26 resolved pull in FOLIO from OCLC; we should also be able to push to FOLIO as an export target in OCLC which is not working yet
tested this with updated hostname sandbox-folio.colorado.edu. @steveellis I need to confirm LogonID/API key for FOLIO authorization. Currently being refused TCP/IP connection
E:updated to correct URL
@CUSeif I think refusing the connection is a higher level problem than authentication. Does the client run on your machine or in a browser?
This is through the Connexion client attempting to access sandbox-folio.colorado.edu
It looks like what we're configuring here is an edge module called edge-connexion.
Background: edge modules expose a special endpoint, usually that is different from the okapi endpoint, allowing "edge" stuff (3rd party services that want to connect to FOLIO and do stuff).
@jwfuller my belief is that in k8s configuring most edge modules involves making a special ingress for each one. Although I'm not sure if that is what we're missing here, because it looks like, unlike say edge-dematic, or edge-whatever, edge-connexion doesn't have such a requirement.
So if it doesn't require a special ingress, it's listening on port 9000 on the okapi endpoint. @CUSeif it seems like you're not hitting okapi. Maybe try doing your request (on port 9000) against this url instead: https://okapi-iris.cublcta.com/.
If my hunch is right and we don't need a special ingress, our pain doesn't likely stop there however. In addition there may need to be:
copycat.all
)The edge modules are a well known PITA because of all of this. That said there is documentation of this in the readmes of each module (I recently contributed some on this very topic!).Each edge module is based on edge-common, although each one has its own quirks.
@steveellis I think you are correct that we need another ingress rule. I think we are only allowing 80 and 443 from outside the cluster. Jeremy tried to telnet onto port 9000
and was unable to connect that way either.
Note: We have fully transitioned to the colorado.edu
domain, so we should be using URLs like sandbox-okapi.colorado.edu
rather than the test domain
@jwfuller researched this a bit more. I think we may end up needing to create an ingress such as sandbox-edge-connexion.colorado.edu
-- i.e. one subdomain for each edge module we want to enable. I don't think edge-connexion is special in that it won't require this.
If that seems like a hassle an alternative might be to try to create one endpoint such as sandbox-edge.colorado.edu
which would handle all edge modules (assuming we're going to add more than one), and then give each edge module its own port. It seems like it's fairly easy to map a port to a service in rancher. And it looks like each edge module has a configurable port.
So I guess the question is what is more hassle, separate domains, or separate ports?
Once we work this out I think I know of how to do the rest of the configuration. It should be something like this which makes a little more sense when you compare these values to this, although these instructions assume we're using the ephemeral.properties file, which isn't what you're supposed to do in production.
I'm pretty sure that the idea that the okapi endpoint can do this work even if we open port 9000 isn't right since okapi requests flow through to "normal" module routes, whereas these edge modules seem to be thought of as "outside" of that request chain.
@steveellis and @jwfuller if you add an annotation to main cluster load balancer nginx.ingress.kubernetes.io/ssl-redirect: true
this will allow 443 redirect. Plus gives you the ability to add SSL/TLS cert hostname within the ingress. @jwfuller we initially had trouble with the ingress bouncing back to 80 but with the annotation the ssl-redirect 443 will work. This may not be the exact problem you are trying to solve but could help.
@steveellis, In honeysuckle, I used the okapi url with added paths. Example: new ingress: path /rtac and service is edge-rtac. SSL/TLS Cert Hostname: sandbox-okapi.colorado.edu
Username and password for the token stored in Vault. https://test-libapps.colorado.edu/ui/vault/secrets/folio-iu-nZ56F3LeAa/show/culibraries
FYI: you can make the path anything you want.... cubl-rtac/
This would make sure you are not going to clobber an existing path that is needed within OKAPI
@steveellis You have to add Institutional User in FOLIO that correspond to edge vaultStore or AwsParamStore.
@mbstacy Thanks for taking a look at this. Interesting! Ok let's see if I understand:
@steveellis yes, that is correct. Except point to the service not the workload. Also, just need to make sure that path is not used by okapi. Could use /rtac if nothing has changed between Honeysuckle and Iris. Just need to let users know that path that you are using for edge api.
@steveellis We used the VaultStore when we deployed the new Iris release(@jwfuller unless changed). The Vault domain and URI should be the same as above( https://test-libapps.colorado.edu/ui/vault/secrets/folio-iu-nZ56F3LeAa/show/culibraries). remove the show
in URI. Just need to add user in FOLIO. I think I shared with you the token for login into Vault. Actually, can view the file in pod for token. Let me know if you have any question or if you want to change to AwsParamStore. I have not implemented production Vault but should be soon.
actually, need to change the configmap https://lib-rancher.colorado.edu/p/c-9q8nc:p-9c2qf/secrets/folio-iris:edge-vault. The address was in the same cluster before. Need to change the address to that actually Vault URL. Restart pod to make sure that the URL Env is correct!
@jwfuller and @mbstacy in the case of connexion we need a domain without a path because that is what clients expect. Either way, we need to expose a different port other than 80 or 443 because connexion isn't HTTP. Clients also expect this port.
So I think the task is to open up a new port (say 9000) to handle this traffic. But how to do this?
Here's what I have learned so far:
Ingresses with "custom ports" (non 80/444) isn't something the ingress controller we have can do (this is the nginx-nginx-controller visible in System). This is why when you add a new ingress via the UI, it has no field for "port". It assumes you're only going to want 80 or 443.
(In nginx.conf in the controller I can see the variables that are setting the port for edge.cublcta.com. We could change these, but whatever we change them to won't likely stick if the controller is redeployed.)
However, what we could try is making the connexion service into a Layer-4 external load balancer. This appears to be something AWS will support. The steps would be:
Remove the edge.cublcta.com ingress that we created. (It's not going to work.)
Add 9000 here:
Yellow box is a dropdown for changing the service type into a Load Balancer. I'm not entirely sure what to put in the red boxes, but these IPs should be available in the AWS console somewhere for our existing load balancer.
The idea is that any traffic on an existing domain *.cublcta.com:9000 will flow through to this service.
References
I can give all of this a try tomorrow, but I wanted to get your thoughts on it before I start messing with things. I don't think there's a huge risk of breaking something since 1) we're just adding a new port on the AWS level, and 2) we're only changing a service for something that isn't working yet anyway (edge-connexion).
@steveellis I think you are on the right track.
The Load Balancer folio-cubl
in the above screen shots is a layer 7 application load balancer, which means it can only handle HTTP/S protocols. I think you are right that we need a layer 4 network load balancer, which can handle TCP/UDP/TLS protocols.
I think we can setup a layer 4 lb in AWS rather than Rancher and hopefully avoid any static IP configuration inside Kubernetes.
@jwfuller and @steveellis FYI AWS is no longer supporting the Classic LB. (Rancher)
EC2-Classic network is enabled for one or more of your Classic Load Balancers in the selected Region. AWS will be retiring the EC2-Classic network on August 15, 2022. To avoid interruptions to your workloads, we recommend that you migrate instances and other AWS resources (running on EC2-Classic) to a VPC prior to August 15, 2022. For details on important dates and resources to help you migrate, see the Amazon VPC FAQs.
On October 30, 2021, AWS will turn off EC2-Classic in Regions that have no active EC2-Classic resources. AWS EC2-Classic resources include: Amazon EC2 Instances, Amazon Relational Database, AWS Elastic Beanstalk, Amazon Redshift, AWS Data Pipeline, Amazon EMR, AWS OpsWorks.
@steveellis You could try to setup a service that is of type NodePort. Direct traffic to port(target group) instead of through default ingress. https://kubernetes.io/docs/concepts/services-networking/service/#nodeport
Here's a summary for where this is at. Basically I've been able to make it work only if the port is exposed to all traffic (0.0.0.0/0
) in the security group. However, I'd like to lock that down since this config opens the port on the cluster to all outside traffic for each node at each node's IP. 0.0.0.0/0
is currently not open.
Here are the steps I've taken.
30767
is now open on the cluster. I can see that it is open using netcat to probe a node's port from the command line inside a pod.edge-connexion
with a target group called edge-connexion
.sg-03fd508b3c9f2a211
for the cluster to try to restrict cluster access to the load balancer's ip public and private addresses.Here are the issues I see:
0.0.0.0/0
is too open. It should be possible to lock it down to the load balancer.edge-connexion
load balancer lots of other IP addresses that don't appear to be in the VPC's CIDR range or to belong to the public or private IPs are in there. Where do these IPs come from? The fact that other IPs are in scope would explain why locking it down to the load balancer's IPs isn't working. Most of these addresses are of the 192.168
variety suggesting they are coming from inside AWS.0.0.0.0/0
or the load balancer's private IPs to the inbound rules allow the health checks on the target to succeed.The private IP addresses of the LB are:
192.168.72.18
192.168.215.72
192.168.153.172
The public IP addresses of the LB are:
edge-connexion-603b3b0d343e6047.elb.us-west-2.amazonaws.com. 60 IN A 35.82.250.59
edge-connexion-603b3b0d343e6047.elb.us-west-2.amazonaws.com. 60 IN A 52.42.209.179
edge-connexion-603b3b0d343e6047.elb.us-west-2.amazonaws.com. 60 IN A 52.89.73.90
There is a route53 endpoint called edge.cublcta.com that points to the load balancer. This endpoint works with 0.0.0.0/0
being used so I'm assuming it is configured correctly.
Haven't tried adding IP addresses for Route53. There aren't that many. These are:
52.95.110.0/24
205.251.192.0/21
63.246.114.0/23
We can request support from AWS through the support contract that the university has through a third party called DLT. The process is to email support@DLT.com and cc oit-cloud-broker@lists.colorado.edu.
We can reach out to the FOLIO community to see if anyone has any experience with this, although it really is more of an AWS question than a FOLIO question.
@jwfuller, @mbstacy and @CUSeif some updates on this:
I tried going the DLT route, and had a nice conversation with their support engineers but they didn't have any ideas.
I reached out on #sysops and was able to talk a bit about this with John Malconian and Jason Root. I thought it was promising to try to switch the service to a LoadBalancer rather than NodePort. By itself that didn't change anything. However adding the annotation service.beta.kubernetes.io/aws-load-balancer-type: nlb
had an effect. It added a handful of other inbound rules to our security group, and created another security group for the load balancer. And yay things started working right?
Well yes they did, except the reason they started working was that the annotation also added the dreaded 0.0.0.0/0
to our inbound rules if you can believe it.
Maybe this was some special new less permissive 0.0.0.0/0
but no, it was the same old permissive one, letting me connect directly to an IP on the nodes.
It does seem like this is what we're supposed to be doing, but it is still quite broken. I wonder how many people do this and think they're done?
I removed the entry for 0.0.0.0/0
.
Looking at this again today it would appear that adding the annotation mentioned above did a handful of things:
0.0.0.0/0
cidr allowing anyone to hit port 30767
using a cluster ip.It would appear that this annotation wants to do much of what I thought we would need to do manually (creating the LB and target groups for example).
For now I have paused the edge-connexion deployment (verifying that it no longer is able to add these rules) and removed the permissive rules from our security group. I would like to find some documentation for this annotation.
These are the load balancers it created (they are still there):
This is the target group it created:
This should allow a user to overlay a record that has an older OCLC number with a record that has the newer OCLC number, but we will need to test.
FOLIO documentation indicates that push from OCLC to FOLIO is possible, we should configure OCLC to do this and test.
This might help us further debug culibraries/folio-ansible#26