assets are not properly mounted when using CRI-O as container engine

RianFuro commented 1 year ago

TL;DR of this thread

CRI-O handles image volumes (the one's defined in the Dockerfile!) differently from Containerd by default, which leads to the app assets not being properly mounted on top of the sites volume.

You can configure cri-o to use bind-mounting (almost like containerd does) by setting image_volumes= "bind" in /etc/crio/crio.conf (instead of the default mkdir.
You might still have permission problems because the mounted volume cannot properly be reassigned ownership, but I haven't tested that enough to know whether this is the storage engines or cri-o's fault. YMMV

Some people instead work around this issue by copying the existing files from the image into the volume in an initContainer (by just mounting the volume somewhere else just for that execution). I haven't gone down that route though, so again YMMV.

Description of the issue

I have been deploying this helm chart according to the installation instructions at https://github.com/frappe/helm/blob/main/erpnext/README.md and I've had a pretty rocky experience. I'm not entirely sure that I didn't overlook something very obvious, however after multiple skims through the instructions, the chart and googling for people having similar issues I have no idea what I could have missed. As such, this is part bug report and part request for guidance.

TL;DR: After following the installation instructions for deploying the chart and adding a site I am greeted with an internal server error when trying to access it through browser (step by step walkthrough of my experience below). I've had to resort to kubectl exec -it ... into the running nginx pod to get the site to a working state, which I am fairly certain is not how it's supposed to work.

Context information (for bug reports)

For what it's worth, I've deployed the helm chart to a fresh, in-house single-node kubernetes cluster with CRI-O as container runtime. It's a fresh installation on top of debian 12, using calico for networking and a way too permissive (think chmod 777) NFS for persistent volumes via https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner.

Steps to reproduce the issue

Deploy the helm chart according to the installation instructions at (https://github.com/frappe/helm/blob/main/erpnext/README.md#installation):

custom-values.yaml

persistence:
  worker:
    storageClass: nfs-client
nginx:
  service:
    type: NodePort
    port: 8080
    nodePort: 30880

NOTES: NodePort mapping for simplicity, since we have an existing reverse proxy and I haven't reconciled that with using an ingress. I've checked that the requests are properly forwarded to where they should be, so I doubt this is an issue regarding the troubles I've had.

kubectl create namespace erpnext
helm repo add frappe https://helm.erpnext.com
helm install frappe-bench -n erpnext -f custom-values.yaml frappe/erpnext

Add a new site according to https://github.com/frappe/helm/blob/main/erpnext/README.md#create-new-site:

create-job-custom-values.yaml

persistence:
  worker:
    storageClass: nfs-client
jobs:
  createSite:
    enabled: true
    siteName: "erp.example.com"
    adminPassword: "verysecurepassword"

helm template frappe-bench -n erpnext frappe/erpnext -f creeate-job-custom-values.yaml -s templates/job-create-site.yaml > create-new-site-job.yaml
kubectl apply -f create-new-site-job.yaml

Try to access the site via erp.example.com:

This is where it gets dicey. Trying to access the new site only results in an internal server error with the following logs:

kubectl logs frappe-bench-erpnext-nginx-656c7bf8f7-pfqsb -n erpnext
--SNIP--
192.168.2.212 - - [14/Jul/2023:08:24:50 +0000] "GET / HTTP/1.0" 500 141 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0"
192.168.2.212 - - [14/Jul/2023:08:24:53 +0000] "GET / HTTP/1.0" 500 141 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0"
192.168.2.212 - - [14/Jul/2023:08:24:53 +0000] "GET /favicon.ico HTTP/1.0" 500 141 "http://erp.example.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0"

kubectl logs frappe-bench-erpnext-gunicorn-5b78bc7dd8-5n62k -n erpnext
[2023-07-14 08:25:55 +0000] [14] [ERROR] Error handling request /favicon.ico
Traceback (most recent call last):
  File "/home/frappe/frappe-bench/apps/frappe/frappe/website/serve.py", line 18, in get_response
    response = renderer_instance.render()
  File "/home/frappe/frappe-bench/apps/frappe/frappe/website/page_renderers/not_found_page.py", line 25, in render
    return super().render()
  File "/home/frappe/frappe-bench/apps/frappe/frappe/website/page_renderers/template_page.py", line 84, in render
    html = self.get_html()
  File "/home/frappe/frappe-bench/apps/frappe/frappe/website/utils.py", line 510, in cache_html_decorator
    html = func(*args, **kwargs)
  File "/home/frappe/frappe-bench/apps/frappe/frappe/website/page_renderers/template_page.py", line 101, in get_html
    html = self.render_template()
  File "/home/frappe/frappe-bench/apps/frappe/frappe/website/page_renderers/template_page.py", line 238, in render_template
    html = frappe.render_template(self.source, self.context, safe_render=safe_render)
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/jinja.py", line 85, in render_template
    return get_jenv().from_string(template).render(context)
  File "/home/frappe/frappe-bench/env/lib/python3.10/site-packages/jinja2/environment.py", line 1301, in render
    self.environment.handle_exception()
  File "/home/frappe/frappe-bench/env/lib/python3.10/site-packages/jinja2/environment.py", line 936, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 1, in top-level template code
  File "/home/frappe/frappe-bench/apps/frappe/frappe/templates/web.html", line 1, in top-level template code
    {% extends base_template_path %}
  File "/home/frappe/frappe-bench/apps/frappe/frappe/templates/base.html", line 25, in top-level template code
    {%- block head -%}
  File "/home/frappe/frappe-bench/apps/frappe/frappe/templates/base.html", line 26, in block 'head'
    {% include "templates/includes/head.html" %}
  File "/home/frappe/frappe-bench/apps/frappe/frappe/templates/includes/head.html", line 8, in top-level template code
    {{ include_style('website.bundle.css') }}
  File "/home/frappe/frappe-bench/env/lib/python3.10/site-packages/jinja2/sandbox.py", line 393, in call
    return __context.call(__obj, *args, **kwargs)
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/jinja_globals.py", line 118, in include_style
    path = bundled_asset(path)
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/jinja_globals.py", line 136, in bundled_asset
    path = bundled_assets.get(path) or path
AttributeError: 'NoneType' object has no attribute 'get'

During handling of the above exception, another exception occurred:

--- SNIP: omitted for brevity. This happens 3 times total, for every fallback error page that should be rendered, always for the same {{ include_style('website.bundle.css') }} statement. ---

AttributeError: 'NoneType' object has no attribute 'get'

Intrigued, but not deterred, I look into the problem and find that the exception happens where it tries to resolve a assets.json file somewhere around here: https://github.com/frappe/frappe/blob/fefd9ac2e2190d37d3669390a2d6285506a2646c/frappe/utils/__init__.py#L964C1-L985 However, the file it's trying to load here doesn't exist at this point in time on my volume. In fact, there is no assets folder at all!:

root@kube-control:/nfs_share/erpnext-frappe-bench-erpnext-pvc-9861a032-a6f4-4799-a5ef-e7f6c36607d4# ls
apps.txt  common_site_config.json  erp.example.com

Here is where I start tinkering to try and make this usable, so probably everything from here on out makes the problem worse before it gets better. That being said, you might find this educational in terms of somebody trying to set this up without much prior knowledge.

If the assets don't exist, might as well try and make them so. So I exec into one of the worker pods to manually run some bench commands

❯ kubectl exec -n erpnext frappe-bench-erpnext-worker-d-9b6f54c78-5qf4r --tty --stdin -- /bin/bash
frappe@frappe-bench-erpnext-worker-d-9b6f54c78-5qf4r:~/frappe-bench$ bench build --force

I realize this is certainly not the right approach, but I'm tinkering here so bear with me. I just want to get to a workable state so I can track back what I'm missing afterwards.

Doing that creates the assets folder and the missing assets.json inside:

root@kube-control:/nfs_share/erpnext-frappe-bench-erpnext-pvc-9861a032-a6f4-4799-a5ef-e7f6c36607d4# ls
apps.txt  assets  common_site_config.json  erp.example.com
root@kube-control:/nfs_share/erpnext-frappe-bench-erpnext-pvc-9861a032-a6f4-4799-a5ef-e7f6c36607d4# ls assets/
assets-rtl.json  assets.json  css  erpnext  frappe  js

However, while I can now load the page, the included CSS still can't be fetched, resulting in 404s for those resources.

So I keep digging. Still with exec inside my worker node I double check that the files trying to be fetched actually exist:

frappe@frappe-bench-erpnext-worker-d-9b6f54c78-h54gv:~/frappe-bench/sites$ ls -lah assets/erpnext/dist/css
total 200K
drwxr-xr-x 1 frappe frappe 4.0K Jul 14 09:36 .
drwxr-xr-x 1 frappe frappe 4.0K Jul 10 14:14 ..
-rw-r--r-- 1 frappe frappe  26K Jul 14 09:36 erpnext-web.bundle.A7LAONH5.css
-rw-r--r-- 1 frappe frappe  37K Jul 14 09:36 erpnext-web.bundle.A7LAONH5.css.map
-rw-r--r-- 1 frappe frappe  45K Jul 14 09:36 erpnext.bundle.MCUJ5PIA.css
-rw-r--r-- 1 frappe frappe  61K Jul 14 09:36 erpnext.bundle.MCUJ5PIA.css.map
-rw-r--r-- 1 frappe frappe  563 Jul 14 09:36 erpnext_email.bundle.QRN2PIZ7.css
-rw-r--r-- 1 frappe frappe  938 Jul 14 09:36 erpnext_email.bundle.QRN2PIZ7.css.map

The hash matches. However, when checking the nginx container, the same path actually resolves to different files!:

❯ kubectl exec -n erpnext --stdin --tty frappe-bench-erpnext-nginx-656c7bf8f7-4lzvv -- /bin/bash
frappe@frappe-bench-erpnext-nginx-656c7bf8f7-4lzvv:~/frappe-bench$ ls -lah sites/assets/erpnext/dist/css
total 200K
drwxr-xr-x 1 frappe frappe 4.0K Jul 14 12:38 .
drwxr-xr-x 1 frappe frappe 4.0K Jul 10 14:14 ..
-rw-r--r-- 1 frappe frappe  26K Jul 14 12:38 erpnext-web.bundle.ZXIG5KFW.css
-rw-r--r-- 1 frappe frappe  37K Jul 14 12:38 erpnext-web.bundle.ZXIG5KFW.css.map
-rw-r--r-- 1 frappe frappe  45K Jul 14 12:38 erpnext.bundle.6GM5GT6J.css
-rw-r--r-- 1 frappe frappe  61K Jul 14 12:38 erpnext.bundle.6GM5GT6J.css.map
-rw-r--r-- 1 frappe frappe  563 Jul 14 12:38 erpnext_email.bundle.OT6EOJQD.css
-rw-r--r-- 1 frappe frappe  936 Jul 14 12:38 erpnext_email.bundle.OT6EOJQD.css.map

After some flailing I finally find out why that is: The assets from the individual apps are actually symlinked in from outside the volume!

frappe@frappe-bench-erpnext-nginx-656c7bf8f7-4lzvv:~/frappe-bench$ ls -lah sites/assets/
total 24K
drwxr-xr-x 4 frappe frappe 4.0K Jul 14 12:31 .
drwxrwxrwx 4 frappe frappe 4.0K Jul 14 12:30 ..
-rw-r--r-- 1 frappe frappe  996 Jul 14 12:31 assets-rtl.json
-rw-r--r-- 1 frappe frappe 3.1K Jul 14 12:31 assets.json
drwxr-xr-x 2 frappe frappe 4.0K Jul 14 12:30 css
lrwxrwxrwx 1 frappe frappe   53 Jul 14 12:30 erpnext -> /home/frappe/frappe-bench/apps/erpnext/erpnext/public
lrwxrwxrwx 1 frappe frappe   51 Jul 14 12:30 frappe -> /home/frappe/frappe-bench/apps/frappe/frappe/public
drwxr-xr-x 2 frappe frappe 4.0K Jul 14 12:30 js

So, after running bench build --force in the nginx pod specifically I finally have a working page result.

Now, reading the bench cli docs would have certainly saved me some headache there, since it has an option to exactly NOT do that (marked as deprecated though), but all of that does make me wonder:

Expected result

How is this supposed to work exactly? Not only have I found no other issues or google results touching on the troubles I've had, but also looking at the helm chart I don't really understand how the assets are supposed to be put into their proper place at all.

Since the documentation doesn't mention anything regarding having to take care of app assets, I would expect that after adding the site it just works^tm, but for now I can't see how it would work.

What exactly have I missed here?

revant commented 1 year ago

Assets are already part of image and symlinked in the image.

They will be empty in volume, they'll have resources in containers as they are symlinked from location inside container.

Try curl request with host header a site name?

All my setups are working, even tests are working.

Can you create a failing test somewhere? Or share access to failing setup?

RianFuro commented 1 year ago

Sorry for my late answer, wrote this before going to bed.

All my setups are working, even tests are working. Can you create a failing test somewhere? Or share access to failing setup?

I'm pretty sure this is somehow related to my system configuration. If all else fails I think I could hand you a snapshot of the VM the cluster runs on.

On that note, I forgot to mention that I'm using CRI-O instead of containerd for my container runtime. After checking the volume definitions in the backing dockerfile, that could definitely be an issue (I'm not knowledgeable enough on that matter though, it's really hard to find specific incompatibilities between docker and the CRI interface)

Assets are already part of image and symlinked in the image. They will be empty in volume, they'll have resources in containers as they are symlinked from location inside container.

I assumed this is how it's supposed to work. However, as you can see in pt. 4, the assets/ folder is simply not present after setting up.

I've since had a look at the frappe_docker code and the container definition - I assume bench init populates the assets directory, which is held by an internal volume in the container?

I'll try comparing with a local minikube/docker installation and fiddle with cri-o a little bit. Depending on what I find I might throw a PR your direction to make note of any incompatibilities in the installation instructions.

revant commented 1 year ago

I'm using CRI-O instead of containerd

Recently someone mentioned this offline. They were successful in running this helm chart on self-hosted CRI-O based cluster. Their storage classes were something custom that allowed RWX. Didn't ask what they were.

Can you try in-cluster nfs server using nfs-ganesha-server-and-external-provisioner like the tests?

I managed to get this NFS server configuration running to use with nfs-subdir-external-provisioner, setup details here: https://github.com/frappe/frappe/wiki/Setup-NFS-Server

revant commented 1 year ago

Another question,

Did the configure job succeed without any errors in pod logs?

Look for volume permission or any other errors. Is there a .build file under root of sites volume? touch sites/.build to create it.

4. In fact, there is no assets folder at all!

This is where I feel things go wrong. Assets directory exists in container as well as volume. Only difference is, inside container it's populated with symlinked assets, inside volume it is present but empty.

RianFuro commented 1 year ago

Can you try in-cluster nfs server using nfs-ganesha-server-and-external-provisioner like the tests?

I can try, but my setup for the external nfs is almost identical and the mounts themselves look proper, so I don't think it's an issue with the storage driver itself.

Did the configure job succeed without any errors in pod logs?

Yes the configure job succeeds.

During my own testing I was able to trim it down to a much smaller test-case:

apiVersion: v1
kind: Pod
metadata: 
  name: frappe-playground
spec:
  containers:
    - name: frappe
      image: frappe/erpnext
      ports:
        - name: http
          containerPort: 8080
          protocol: TCP

Checking this container, the filesystem structure looks proper:

kubectl apply -n erpnext -f frappe-minimal.yaml
pod/frappe-playground created
kubectl -n erpnext exec -it frappe-playground -- /bin/bash

frappe@frappe-playground:~/frappe-bench$ ls
apps  config  env  logs  patches.txt  sites
frappe@frappe-playground:~/frappe-bench$ ls sites/
apps.json  apps.txt  assets  common_site_config.json
frappe@frappe-playground:~/frappe-bench$ ls sites/assets/
assets-rtl.json  assets.json  css  erpnext  frappe  js
frappe@frappe-playground:~/frappe-bench$

However, if I add a simple volume mount, the data is gone, seemingly overridden by the mounted volume.

apiVersion: v1
kind: Pod
metadata: 
  name: frappe-playground
spec:
  containers:
    - name: frappe
      image: frappe/erpnext
      ports:
        - name: http
          containerPort: 8080
          protocol: TCP
      volumeMounts:
        - name: sites-dir
          mountPath: /home/frappe/frappe-bench/sites
  volumes:
    - name: sites-dir
      emptyDir: {}

kubectl apply -n erpnext -f frappe-minimal.yaml 
pod/frappe-playground created

kubectl -n erpnext exec -it frappe-playground -- /bin/bash
frappe@frappe-playground:~/frappe-bench$ ls sites/
frappe@frappe-playground:~/frappe-bench$

Note that I'm just using emptyDir here as a storage driver, so the nfs shouldn't be the issue.

revant commented 1 year ago

As soon as you mount volume it becomes empty is expected.

What's not expected is the assets becoming empty. Assets is different volume and as it has no vol driver creating it, container should create and use unnamed volume with assets available in it. (Assumption)

This nesting of volume creates problems that's why 2 mounts. Second mount ensures assets directory is separate from parent sites mount.

https://github.com/frappe/frappe_docker/blob/f485a13a25c9155ac599018f92f7c6138c8e8e96/images/production/Containerfile#L124-L125

RianFuro commented 1 year ago

I realize that's what is intended, but cri-o does not handle image volumes like docker/containerd does.

It's not all that well documented, but from what I've gathered crio has a image_volumes configuration option under [crio.image] that controls how image volumes are handled:

mkdir: A directory is created inside the container root filesystem for the volumes.

bind: A directory is created inside container state directory and bind mounted into the container for the volumes.

ignore: All volumes are just ignored and no action is taken. (default: mkdir)

I have tried setting image_volumes="bind" which restores the intended volume layout... kind of:

kube@kube-control:~$ kubectl apply -f frappe-minimal.yaml -n erpnext
pod/frappe-playground created
kube@kube-control:~$ kubectl exec -n erpnext -it frappe-playground -- /bin/bash
frappe@frappe-playground:~/frappe-bench$ ls
apps  config  env  logs  patches.txt  sites
frappe@frappe-playground:~/frappe-bench$ ls sites/
assets
frappe@frappe-playground:~/frappe-bench$ ls -lah sites/
total 8.0K
drwxr-xr-x 3 root   root     60 Jul 16 09:01 .
drwxr-xr-x 1 frappe frappe 4.0K Jul 16 00:10 ..
drwxr-xr-x 2 root   root     40 Jul 16 09:01 assets

Big caveat here: the volumes are mounted as root, so right now I don't have any write-access to the mounted volumes... Sure I could run the container as root, but I'd like to avoid that.

One common solution to the volue mounting problem I found so far is that people just use an init container to copy the existing files into the volume (while mounting the volume on a different folder, just for the init step).

revant commented 1 year ago

the volumes are mounted as root

Doesn't matter if it's assets, you don't need write access.

For sites and logs there is an initContainer to fixVolume, it chown files to 1000:1000

https://github.com/frappe/helm/blob/3e71d8c2ac028ec51ec77bce7ec216519a675d4d/erpnext/templates/job-configure-bench.yaml#L24

image_volumes="bind"

Can you try "mkdir" if it makes any difference? We just need it for symlink

RianFuro commented 1 year ago

Can you try "mkdir" if it makes any difference? We just need it for symlink

"mkdir" is the default and behaves as described previously. (I've had it explicitly set to mkdir for a while now since I've tried the alternative)

For sites and logs there is an initContainer to fixVolume, it chown files to 1000:1000

This failed for me (with operation not permitted as far as I remember) when I deployed the chart under image_volumes="bind". I'll run it again in a bit and give you the full logs.

RianFuro commented 1 year ago

I'll run it again in a bit and give you the full logs.

Ok, it didn't fail, it just... had no apparent effect - owner of the volume is still root and subsequent calls in the config job failed with permission denied. Also no log output from the init container.

I would think that this could be due to the storage driver not permitting it, except previously the permission was set without an issue and I didn't change anything related to the nfs server or it's storage class.

My guess is that the overlayfs in between (due to the bind mount) is getting in the way this time, but I need to hook into the running container to check that.

Will whip something up in the afternoon. Will probably also try with an in-cluster nfs like you suggested previously

RianFuro commented 1 year ago

Ok, those are... interesting results. I put some sleeps into the config job containers (both init and the configure container), to hook into them and check the FS ownership during execution, here are the resulsts:

The init container, after chown

❯ kubectl exec -n erpnext --stdin --tty frappe-bench-erpnext-conf-bench-20230716155227-bmzws -c frappe-bench-ownership -- /bin/bash
root@frappe-bench-erpnext-conf-bench-20230716155227-bmzws:/home/frappe/frappe-bench# ls -lah
total 24K
drwxr-xr-x 7 frappe frappe 4.0K Jul 10 14:12 .
drwxr-xr-x 1 frappe frappe 4.0K Jul 10 14:15 ..
drwxr-xr-x 4 frappe frappe 4.0K Jul 10 14:14 apps
drwxr-xr-x 3 frappe frappe 4.0K Jul 10 14:12 config
drwxr-xr-x 6 frappe frappe 4.0K Jul 10 14:13 env
drwxr-xr-x 2 frappe frappe   40 Jul 16 13:52 logs
-rw-r--r-- 1 frappe frappe  346 Jul 10 14:12 patches.txt
drwxr-xr-x 3 frappe frappe   60 Jul 16 13:52 sites

The configure container, before the first actual line of code:

❯ kubectl exec -n erpnext --stdin --tty frappe-bench-erpnext-conf-bench-20230716155227-bmzws -- /bin/bash
Defaulted container "configure" out of: configure, frappe-bench-ownership (init)
frappe@frappe-bench-erpnext-conf-bench-20230716155227-bmzws:~/frappe-bench$ ls -lah
total 24K
drwxr-xr-x 7 frappe frappe 4.0K Jul 10 14:12 .
drwxr-xr-x 1 frappe frappe 4.0K Jul 10 14:15 ..
drwxr-xr-x 4 frappe frappe 4.0K Jul 10 14:14 apps
drwxr-xr-x 3 frappe frappe 4.0K Jul 10 14:12 config
drwxr-xr-x 6 frappe frappe 4.0K Jul 10 14:13 env
drwxr-xr-x 2 root   root     40 Jul 16 13:53 logs
-rw-r--r-- 1 frappe frappe  346 Jul 10 14:12 patches.txt
drwxr-xr-x 3 root   root     60 Jul 16 13:53 sites

Now to the interesting part.
I also tried to produce a stripped down version of the problem:

kind: Pod
apiVersion: v1
metadata:
  name: volume-editor
spec:
  volumes:
    - name: sites-dir
      persistentVolumeClaim:
        claimName: frappe-bench-erpnext
  initContainers:
    - name: frappe-bench-ownership
      image: frappe/erpnext
      command: ['sh', '-c']
      args:
        - chown -R 1000:1000 /data
      securityContext:
        runAsUser: 0
      volumeMounts:
        - name: sites-dir
          mountPath: /data
  containers:
    - name: sleeper
      image: frappe/erpnext
      command: ['sleep', 'infinity']
      volumeMounts:
        - name: sites-dir
          mountPath: /data

Which actually sets the permission correctly!

kube@kube-control:~$ kubectl exec -it -n erpnext volume-editor -- /bin/sh
Defaulted container "sleeper" out of: sleeper, frappe-bench-ownership (init)
$ bash
frappe@volume-editor:~/frappe-bench$ ls /data
frappe@volume-editor:~/frappe-bench$ ls -lah /data/
total 8.0K
drwxrwxrwx 2 frappe frappe 4.0K Jul 16 09:27 .
dr-xr-xr-x 1 root   root   4.0K Jul 16 12:26 ..

no idea what's going on here.

RianFuro commented 1 year ago

I thought about this issue in general and while I enjoy tinkering with this, I think a solution that depends on a system setting (setting image_volumes="bind" in cri-o is a system-wide setting) is less than ideal.

As for myself, I will likely either switch to using containerd or add an additional init container to the config job that will copy over the initial data into the mounted volume.

As for the latter idea, if that works out, would you accept a PR integrating that into the config job template, behind a flag-parameter? (I suppose a separate job would be fine too)

revant commented 1 year ago

I've proposal,

We've entrypoint script for nginx anyway. We can check if assets exists or create a dir and symlink.

Here: https://github.com/frappe/frappe_docker/blob/main/resources/nginx-entrypoint.sh

Can you try it in custom image? If it works we'll make the change.

Edit:

Other containers may also need assets, rendering pdf with css, sending email, rendering jinja2 templates.

We can make it into an entrypoint script that can be optionally overridden for such directory creation and symlink.

RianFuro commented 1 year ago

FYI, I've basically run out of time work-wise and simply switched over to using containerd. Given the problems I've had here and the generally poor documentation of CRI-O this seems the more reasonable solution to me, for the time being.
All other pieces of my setup staying the same, ERPNext now works without an issue :)

As for your suggestion:
First of all, you're the maintainer here, so you don't have to make any proposals to me :sweat_smile:. That being said, if you want my opinion on the matter, I wouldn't try to solve a Kubernetes issue by including a workaround in the container, especially when a reasonable solution like an init job is perfectly workable.
That being said, if you want to tackle this at the container level I would go all the way and restructure the file system layout, such that the assets folder is no longer a subdirectory of a volume. NGINX specifically shouldn't have a problem with that, since you have a specific routing rule for assets already, but obviously I can't speak to other containers and their dependency on the assets folder.

revant commented 1 year ago

First of all, you're the maintainer here, so you don't have to make any proposals to me . That being said, if you want my opinion on the matter, I wouldn't try to solve a Kubernetes issue by including a workaround in the container, especially when a reasonable solution like an init job is perfectly workable.

Okay! I'll leave it at what it is right now.

That being said, if you want to tackle this at the container level I would go all the way and restructure the file system layout, such that the assets folder is no longer a subdirectory of a volume. NGINX specifically shouldn't have a problem with that, since you have a specific routing rule for assets already, but obviously I can't speak to other containers and their dependency on the assets folder.

Yes! I'd prefer that too.

I think bench command and frappe framework assumes the directory structure of sites and sites/assets. We can manage the directory, routes, nginx config in containers. The framework will still need the above structure. All the custom apps also follow this structure enforced by framework.

revant commented 1 year ago

To summarize

use image_volumes="bind"
use init container and fix volume job
use custom jobs
use entrypoint scripts for assets dir creation and symlinking in custom containers

revant commented 1 year ago

Faced this recently,

Setup consist of NAS for Storage for RWX storage class and CRI-O with image_volumes=mkdir (default)

Added ENTRYPOINT: ["entrypoint.sh"] in image. Script as follow:

#!/bin/bash

# Create assets directory if not found
[ -d "${PWD}/sites/assets" ] || mkdir -p "${PWD}/sites/assets"

# Copy assets*.json from image to assets volume if updated
cp -uf /opt/frappe/assets/*.json "${PWD}/sites/assets/" 2>/dev/null

# Symlink public directories of app(s) to assets
find apps -type d -name public | while read -r line; do
  app_name=$(echo "${line}" | awk -F / '{print $3}')
  assets_source=${PWD}/${line}
  assets_dest=${PWD}/sites/assets/${app_name}
  ln -sf "${assets_source}" "${assets_dest}";
done

exec "$@"

rasos commented 10 months ago

I was having the same journey on a cluster with cri-o:

no assets were created, got 404
there was no assets directory
I added a mkdir /home/frappe/frappe-bench/sites/assets in the frappe-bench-ownership init container
applied bench update in gunicorn pod
now site answered, but some css were missing
found out that the css of the erpnext and frappe apps had a different hash in the nginx pod
restarted the nginx pod, which should have triggered entrypoint.sh - no effect
renamed manually in the nginx pod a dozen css to deliver with the same hash (next time I just do bench update in the nginx pod)
now I finally got a working ERPnext site

So there seems to be still an issue with the first round of assets generation. And I am wondering what happens with the app assets after the next upgrade.

Changing the setting image_volumes="bind" in cri-o is not an option, as this is a system-wide setting and would probably affect several other deployments on the same cluster.

rasos commented 5 months ago

I solved it by adding bench build to nginx deployment every time it is started:

     lifecycle:
        postStart:
          exec:
            command: ["/bin/sh", "-c", "bench build"]

(I use kubectl, which I derived from the helm chart)

frappe / helm