breedfides / airflow-etl

0 stars 2 forks source link

Deployment of AirFlow #2

Closed gannebamm closed 7 months ago

gannebamm commented 9 months ago

status quo

Currently, we only have dev instances of AirFlow running. Those instances are not accessible in an easy way, but you have to connect to the VM itself. The frontend team will be unable to test functionalities like triggering DAG runs.

Expected behaviour

The frontend team should be able to test triggering DAGs of our AirFlow. Therefore, an accessible instance shall be deployed on our de.NBI. We have an Open Stack project called 'BreedFides ETL pipeline for data trustee implementation test as OpenStack' It is necessary to follow these guidelines of de.NBI:

gannebamm commented 9 months ago

The old simple VM was dumped, and a new one was created. This issue is on hold until CEBITEC has answered our questions regarding hosting a web service on de.NBI

gannebamm commented 9 months ago

We discussed with CEBITEC that a simple VM is not a good choice for running a public webserver. Therefore, an Open Stack project was created. It is called 'BreedFides ETL pipeline for data trustee implementation test as OpenStack' in de.nbi and you @brightemahpixida should have admin privs there. Please give consent on your denbi profile for Bielefelds AAI. Please create the needed Open Stack components to get a webserver running, like network, router, VM and object storage container.

brightemahpixida commented 9 months ago

Thanks @gannebamm - i got the notification earlier, i'll have a look 👍

brightemahpixida commented 9 months ago

Hi @gannebamm, i'm having a bit of an issue logging in to the openstack dashboard. I get this 401 status message each time i try to login:

{"error":{"code":401,"message":"The request you have made requires authentication.","title":"Unauthorized"}}

On the login page, i authenticate using the ELIXIR_OIDC medium - or what do you think, am i doing it wrong

brightemahpixida commented 9 months ago

Hi @gannebamm - i was able to log into the dashboard this morning, Thanks for fixing it 👍

gannebamm commented 9 months ago

Great. Please follow the official documentation to deploy the service. You are listed as admin and should have enough privileges to do so. Otherwise please do not hesitate to contact me here.

brightemahpixida commented 9 months ago

Sure thing, i'll keep that in mind 👍

brightemahpixida commented 9 months ago

hi @gannebamm, currently making some progress with this topic - i believe the instance flavor we should be using is the de.NBI mini. If this is the case it looks like we might need to re-adjust the allocated RAM for our project from 7GB to 7.91GB, as the current RAM quota on our project is 7GB while the instance flavor requires 7.91GB

Screenshot 2023-11-23 162427

gannebamm commented 9 months ago

I have requested a resource modification

brightemahpixida commented 9 months ago

Great Thank you 👍

gannebamm commented 9 months ago

@brightemahpixida Resources are ready:

Requested resources:

VMs: 2 VMs
Cores: 8 cores
Ram: 15,82 GB
Storage Limit: 0 GB
Volume Counter: 0
Object Storage: 80 GB
Flavors:  de.NBI mini: 2  
brightemahpixida commented 9 months ago

@gannebamm Thanks i got the notification a couple of minutes ago

brightemahpixida commented 9 months ago

Hi @gannebamm - we have the server up and running now (the Network, Router, VM, Security Group has been created); i just need to install a couple more dependencies before deploying Airflow : )

I'll keep you updated on this

brightemahpixida commented 9 months ago

@gannebamm - I have a question regarding access to the server. Apart from my current work on it, who else requires root (shell) access?

Additionally we might probably need register a domain name (maybe something like breedfides-airflow.bi.denbi.de) with Cebitec considering our plan to expose the deployed Airflow URL

Or what do you think?

brightemahpixida commented 9 months ago

Hi @gannebamm, if it's not possible to secure a domain name from Cebitec for our airflow webserver - we could opt with the option of using SSH tunneling to view the UI

Also as an update, i was able to install the dependencies as mentioned last week Friday - we now have airflow running on the server, the screenshot below is the UI which i am able to access using SSH tunneling (notice the URL indicates it is running on localhost 0.0.0.0:8080)

airfow

gannebamm commented 9 months ago

Hi @brightemahpixida I am currently at the denbi user meeting and will ask the team how to apply for a domain name. We will not need to ssh tunnel.

brightemahpixida commented 9 months ago

@gannebamm Great thanks 👍

gannebamm commented 9 months ago

I have merged both PRs. @brightemahpixida what is the state of affairs for the deployment?

I would like to have a reverse proxy with stable IP and the domain name and behind this reverse proxy the airflow VM as second service. For the SSL cert letsenscrypt shall be used.

brightemahpixida commented 9 months ago

@gannebamm Thanks, i was working on a couple of some blockers on the airflow DAG earlier on today. I can see the reverse-proxy server has already been created by Cebitec (also including the domain IP) - we already airflow running on the instance i created a couple of weeks ago, right now i'm currently looking into how we can configure our instance to point to/work behind the reverse proxy server.

Just for more clarity, if i was creating the SSL cert i will probably need some official email address to associate the SSL certificate with

gannebamm commented 9 months ago

@gannebamm Thanks, i was working on a couple of some blockers on the airflow DAG earlier on today. I can see the reverse-proxy server has already been created by Cebitec (also including the domain IP) - we already airflow running on the instance i created a couple of weeks ago, right now i'm currently looking into how we can configure our instance to point to/work behind the reverse proxy server.

Just for more clarity, if i was creating the SSL cert i will probably need some official email address to associate the SSL certificate with

Please use mine as PI for the project

brightemahpixida commented 9 months ago

Oh ok

brightemahpixida commented 9 months ago

Hi @gannebamm - is there a way you can inquire from Cebitec how we could ssh to the reverse-proxy instance they created last week - i'm having a bit of difficulty doing this, usually if i want to ssh to an instance i always start by associating the floating IP (i.e. 129.70.51.90) to the instance, then i ssh using this ip (the terminal command for this is ssh -i /private_key/ ubuntu@129.70.51.90).

If i try the same thing with the reverse-proxy-server - i get a Permission denied (publickey) error

brightemahpixida commented 9 months ago

I also tried accessing the server via the denbi console using the default username/password (i.e. ubuntu and blank), but i still keep getting unauthorized

gannebamm commented 9 months ago

When creating a VM, you will be given the option to add public keys to the instance automatically. The default is denbi_by_perun. The issue is that this key is fetched from perun (SSO service) and is bound to your own de.NBI account. Therefore, my denbi_by_perun differs from your denbi_by_perun. I have added your public key and my public key as explicit keys. See this screenshot: grafik

I have added your public key to the authorized_keys of the reverse proxy. Therefore, you should now be able to connect. Please add my public key to the airflow instance, too.

To connect, I use VSCode Remote Connections with the following config file:

Host  de.NBI-BreedFides-RevProxy
  HostName  129.70.51.90
  User ubuntu
  IdentityFile ~/.ssh/YOUR-PRIVATE-KEY.ppk
brightemahpixida commented 9 months ago

Great! Thank you, i can connect to the reverse-proxy instance now - i'll make sure to add your key to the other instance

brightemahpixida commented 9 months ago

i've added your key to the airflow instance

brightemahpixida commented 9 months ago

@gannebamm - I have the airflow instance running now on the domain - i still need to work out the security on the server (i.e. both the airflow-instance and the reverse-proxy) and also some other minor adjustments

sshot

brightemahpixida commented 9 months ago

Hi @gannebamm - I've successfully deployed the Airflow setup, and you can now access the user interface via the following domain: https://breedfides-airflow.bi.denbi.de/home.

As per your request, the Airflow instance is operating behind the reverse-proxy server. I've implemented additional layers of authentication. The initial layer involves basic authentication on the reverse-proxy server. Consequently, to access the Airflow UI, you'll encounter a browser prompt to input authentication details. The second authentication layer is within the Airflow UI itself. I'll provide the authentication details for both layers in a subsequent email.

The chosen reverse-proxy software is Caddy, which, in my opinion, is well-suited due to its seamless integration with the SSL certificate provided by Let'sEncrypt.

Please be aware that to access the Airflow site, you'll need to associate the floating IP with the reverse proxy server. Currently, this association is in place. The Caddy configuration directory is located at /home/ubuntu/caddy_config/Caddyfile, accessible within the reverse proxy server named BreedFides-ReverseProxy.

Hopefully we can get some sort of security audit from Cebitec to ensure everything is in order and secure.

gannebamm commented 7 months ago

I was able to log in and see AirFlow is running fine.