Configure and test Push from OCLC to FOLIO

jwfuller commented 3 years ago

FOLIO documentation indicates that push from OCLC to FOLIO is possible, we should configure OCLC to do this and test.

This might help us further debug culibraries/folio-ansible#26

jwfuller commented 3 years ago

@CUSeif and I tried this using the windows client and got a TCP/IP error 'Could not resolve Hostname'. I dropped the IP rules from the load balancer and we got the same error. Further debugging is needed. The web client might help macOS users test this (but it is not a secure connection which troubles me).

steveellis commented 3 years ago

I think this has been resolved with this ticket, no?

jwfuller commented 3 years ago

Not completely. culibraries/folio-ansible#26 resolved pull in FOLIO from OCLC; we should also be able to push to FOLIO as an export target in OCLC which is not working yet

CUSeif commented 3 years ago

tested this with updated hostname sandbox-folio.colorado.edu. @steveellis I need to confirm LogonID/API key for FOLIO authorization. Currently being refused TCP/IP connection

E:updated to correct URL

jwfuller commented 3 years ago

@CUSeif I think refusing the connection is a higher level problem than authentication. Does the client run on your machine or in a browser?

CUSeif commented 3 years ago

This is through the Connexion client attempting to access sandbox-folio.colorado.edu

steveellis commented 3 years ago

It looks like what we're configuring here is an edge module called edge-connexion.

Background: edge modules expose a special endpoint, usually that is different from the okapi endpoint, allowing "edge" stuff (3rd party services that want to connect to FOLIO and do stuff).

@jwfuller my belief is that in k8s configuring most edge modules involves making a special ingress for each one. Although I'm not sure if that is what we're missing here, because it looks like, unlike say edge-dematic, or edge-whatever, edge-connexion doesn't have such a requirement.

So if it doesn't require a special ingress, it's listening on port 9000 on the okapi endpoint. @CUSeif it seems like you're not hitting okapi. Maybe try doing your request (on port 9000) against this url instead: https://okapi-iris.cublcta.com/.

If my hunch is right and we don't need a special ingress, our pain doesn't likely stop there however. In addition there may need to be:

A special "institutional user" created
A special institutional user token
Special permissions for this user (maybe copycat.all)

The edge modules are a well known PITA because of all of this. That said there is documentation of this in the readmes of each module (I recently contributed some on this very topic!).Each edge module is based on edge-common, although each one has its own quirks.

jwfuller commented 3 years ago

@steveellis I think you are correct that we need another ingress rule. I think we are only allowing 80 and 443 from outside the cluster. Jeremy tried to telnet onto port 9000 and was unable to connect that way either.

Note: We have fully transitioned to the colorado.edu domain, so we should be using URLs like sandbox-okapi.colorado.edu rather than the test domain

steveellis commented 3 years ago

@jwfuller researched this a bit more. I think we may end up needing to create an ingress such as sandbox-edge-connexion.colorado.edu -- i.e. one subdomain for each edge module we want to enable. I don't think edge-connexion is special in that it won't require this.

If that seems like a hassle an alternative might be to try to create one endpoint such as sandbox-edge.colorado.edu which would handle all edge modules (assuming we're going to add more than one), and then give each edge module its own port. It seems like it's fairly easy to map a port to a service in rancher. And it looks like each edge module has a configurable port.

So I guess the question is what is more hassle, separate domains, or separate ports?

Once we work this out I think I know of how to do the rest of the configuration. It should be something like this which makes a little more sense when you compare these values to this, although these instructions assume we're using the ephemeral.properties file, which isn't what you're supposed to do in production.

I'm pretty sure that the idea that the okapi endpoint can do this work even if we open port 9000 isn't right since okapi requests flow through to "normal" module routes, whereas these edge modules seem to be thought of as "outside" of that request chain.

mbstacy commented 3 years ago

@steveellis and @jwfuller if you add an annotation to main cluster load balancer nginx.ingress.kubernetes.io/ssl-redirect: true this will allow 443 redirect. Plus gives you the ability to add SSL/TLS cert hostname within the ingress. @jwfuller we initially had trouble with the ingress bouncing back to 80 but with the annotation the ssl-redirect 443 will work. This may not be the exact problem you are trying to solve but could help.

mbstacy commented 3 years ago

@steveellis, In honeysuckle, I used the okapi url with added paths. Example: new ingress: path /rtac and service is edge-rtac. SSL/TLS Cert Hostname: sandbox-okapi.colorado.edu

Username and password for the token stored in Vault. https://test-libapps.colorado.edu/ui/vault/secrets/folio-iu-nZ56F3LeAa/show/culibraries

FYI: you can make the path anything you want.... cubl-rtac/

This would make sure you are not going to clobber an existing path that is needed within OKAPI

mbstacy commented 3 years ago

@steveellis You have to add Institutional User in FOLIO that correspond to edge vaultStore or AwsParamStore.

steveellis commented 3 years ago

@mbstacy Thanks for taking a look at this. Interesting! Ok let's see if I understand:

We can use the existing domain sandbox-okapi.colorado.edu (no need to add more domains and certs)
The trick is to create a new ingress using a path like sandbox-okapi.colorado.edu/cubl-rtac
This path/ingress points to the given edge module workload

mbstacy commented 3 years ago

@steveellis yes, that is correct. Except point to the service not the workload. Also, just need to make sure that path is not used by okapi. Could use /rtac if nothing has changed between Honeysuckle and Iris. Just need to let users know that path that you are using for edge api.

mbstacy commented 3 years ago

@steveellis We used the VaultStore when we deployed the new Iris release(@jwfuller unless changed). The Vault domain and URI should be the same as above( https://test-libapps.colorado.edu/ui/vault/secrets/folio-iu-nZ56F3LeAa/show/culibraries). remove the show in URI. Just need to add user in FOLIO. I think I shared with you the token for login into Vault. Actually, can view the file in pod for token. Let me know if you have any question or if you want to change to AwsParamStore. I have not implemented production Vault but should be soon.

mbstacy commented 3 years ago

actually, need to change the configmap https://lib-rancher.colorado.edu/p/c-9q8nc:p-9c2qf/secrets/folio-iris:edge-vault. The address was in the same cluster before. Need to change the address to that actually Vault URL. Restart pod to make sure that the URL Env is correct!

steveellis commented 3 years ago

@jwfuller and @mbstacy in the case of connexion we need a domain without a path because that is what clients expect. Either way, we need to expose a different port other than 80 or 443 because connexion isn't HTTP. Clients also expect this port.

So I think the task is to open up a new port (say 9000) to handle this traffic. But how to do this?

Here's what I have learned so far:

Ingresses with "custom ports" (non 80/444) isn't something the ingress controller we have can do (this is the nginx-nginx-controller visible in System). This is why when you add a new ingress via the UI, it has no field for "port". It assumes you're only going to want 80 or 443.

(In nginx.conf in the controller I can see the variables that are setting the port for edge.cublcta.com. We could change these, but whatever we change them to won't likely stick if the controller is redeployed.)

However, what we could try is making the connexion service into a Layer-4 external load balancer. This appears to be something AWS will support. The steps would be:

Remove the edge.cublcta.com ingress that we created. (It's not going to work.)
Add 9000 here:

Change the service for edge-connexion into a Layer-4 Load Balancer here (available under advanced options):

Yellow box is a dropdown for changing the service type into a Load Balancer. I'm not entirely sure what to put in the red boxes, but these IPs should be available in the AWS console somewhere for our existing load balancer.

The idea is that any traffic on an existing domain *.cublcta.com:9000 will flow through to this service.

References

I can give all of this a try tomorrow, but I wanted to get your thoughts on it before I start messing with things. I don't think there's a huge risk of breaking something since 1) we're just adding a new port on the AWS level, and 2) we're only changing a service for something that isn't working yet anyway (edge-connexion).

jwfuller commented 3 years ago

@steveellis I think you are on the right track.

The Load Balancer folio-cubl in the above screen shots is a layer 7 application load balancer, which means it can only handle HTTP/S protocols. I think you are right that we need a layer 4 network load balancer, which can handle TCP/UDP/TLS protocols.

I think we can setup a layer 4 lb in AWS rather than Rancher and hopefully avoid any static IP configuration inside Kubernetes.

mbstacy commented 3 years ago

@jwfuller and @steveellis FYI AWS is no longer supporting the Classic LB. (Rancher)

EC2-Classic network is enabled for one or more of your Classic Load Balancers in the selected Region. AWS will be retiring the EC2-Classic network on August 15, 2022. To avoid interruptions to your workloads, we recommend that you migrate instances and other AWS resources (running on EC2-Classic) to a VPC prior to August 15, 2022. For details on important dates and resources to help you migrate, see the Amazon VPC FAQs.

On October 30, 2021, AWS will turn off EC2-Classic in Regions that have no active EC2-Classic resources. AWS EC2-Classic resources include: Amazon EC2 Instances, Amazon Relational Database, AWS Elastic Beanstalk, Amazon Redshift, AWS Data Pipeline, Amazon EMR, AWS OpsWorks.

mbstacy commented 3 years ago

@steveellis You could try to setup a service that is of type NodePort. Direct traffic to port(target group) instead of through default ingress. https://kubernetes.io/docs/concepts/services-networking/service/#nodeport

steveellis commented 3 years ago

Here's a summary for where this is at. Basically I've been able to make it work only if the port is exposed to all traffic (0.0.0.0/0) in the security group. However, I'd like to lock that down since this config opens the port on the cluster to all outside traffic for each node at each node's IP. 0.0.0.0/0 is currently not open.

Here are the steps I've taken.

The edge-connexion service has been converted to a NodePort via rancher, where the port 30767 is now open on the cluster. I can see that it is open using netcat to probe a node's port from the command line inside a pod.
Created a Load Balancer called edge-connexion with a target group called edge-connexion.
Experimented with inbound rules in the security group sg-03fd508b3c9f2a211 for the cluster to try to restrict cluster access to the load balancer's ip public and private addresses.
Tried adding the VPC's CIDR to the inbound rules thinking that might be a somewhat broader limit but still a limit.

Here are the issues I see:

0.0.0.0/0 is too open. It should be possible to lock it down to the load balancer.
When I log the network traffic on one of the three network interfaces associated with the edge-connexion load balancer lots of other IP addresses that don't appear to be in the VPC's CIDR range or to belong to the public or private IPs are in there. Where do these IPs come from? The fact that other IPs are in scope would explain why locking it down to the load balancer's IPs isn't working. Most of these addresses are of the 192.168 variety suggesting they are coming from inside AWS.
Adding either 0.0.0.0/0 or the load balancer's private IPs to the inbound rules allow the health checks on the target to succeed.
Adding the public IPs of the load balancer to the inbound rules of the security group hasn't helped.

Research

Statement in the docs about how network LBs don't have their own security groups and how we need to use the security group of the target.
Docs suggesting we do what we have done on the security group level to open things up only to the load balancer.
General load balancer docs.

Details

The private IP addresses of the LB are:

192.168.72.18
192.168.215.72
192.168.153.172

The public IP addresses of the LB are:

edge-connexion-603b3b0d343e6047.elb.us-west-2.amazonaws.com. 60 IN A 35.82.250.59
edge-connexion-603b3b0d343e6047.elb.us-west-2.amazonaws.com. 60 IN A 52.42.209.179
edge-connexion-603b3b0d343e6047.elb.us-west-2.amazonaws.com. 60 IN A 52.89.73.90

There is a route53 endpoint called edge.cublcta.com that points to the load balancer. This endpoint works with 0.0.0.0/0 being used so I'm assuming it is configured correctly.

Other things we can try

Haven't tried adding IP addresses for Route53. There aren't that many. These are:
```
52.95.110.0/24
205.251.192.0/21
63.246.114.0/23
```
We can request support from AWS through the support contract that the university has through a third party called DLT. The process is to email support@DLT.com and cc oit-cloud-broker@lists.colorado.edu.
We can reach out to the FOLIO community to see if anyone has any experience with this, although it really is more of an AWS question than a FOLIO question.

steveellis commented 3 years ago

@jwfuller, @mbstacy and @CUSeif some updates on this:

I tried going the DLT route, and had a nice conversation with their support engineers but they didn't have any ideas.

I reached out on #sysops and was able to talk a bit about this with John Malconian and Jason Root. I thought it was promising to try to switch the service to a LoadBalancer rather than NodePort. By itself that didn't change anything. However adding the annotation service.beta.kubernetes.io/aws-load-balancer-type: nlb had an effect. It added a handful of other inbound rules to our security group, and created another security group for the load balancer. And yay things started working right?

Well yes they did, except the reason they started working was that the annotation also added the dreaded 0.0.0.0/0 to our inbound rules if you can believe it.

Maybe this was some special new less permissive 0.0.0.0/0 but no, it was the same old permissive one, letting me connect directly to an IP on the nodes.

It does seem like this is what we're supposed to be doing, but it is still quite broken. I wonder how many people do this and think they're done?

I removed the entry for 0.0.0.0/0.

steveellis commented 3 years ago

Looking at this again today it would appear that adding the annotation mentioned above did a handful of things:

It created two load balancers, one network and one classic.
It created a new security group.
It modified the inbound rules of our existing security group with a 0.0.0.0/0 cidr allowing anyone to hit port 30767 using a cluster ip.

It would appear that this annotation wants to do much of what I thought we would need to do manually (creating the LB and target groups for example).

For now I have paused the edge-connexion deployment (verifying that it no longer is able to add these rules) and removed the permissive rules from our security group. I would like to find some documentation for this annotation.

These are the load balancers it created (they are still there):

This is the target group it created:

jwfuller commented 2 years ago

This should allow a user to overlay a record that has an older OCLC number with a record that has the newer OCLC number, but we will need to test.

culibraries / folio