feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.58k stars 996 forks source link

Feature Server image won't start in an OpenShift cluster #4095

Closed tchughesiv closed 6 months ago

tchughesiv commented 6 months ago

Expected Behavior

The python Feature Server image should run in an OpenShift cluster without issue.

Current Behavior

The container doesn't start successfully, throws this error -

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/pathlib.py", line 1288, in mkdir
    self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/.cache/fissix/21.11.13'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/pathlib.py", line 1288, in mkdir
    self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/.cache/fissix'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/feast", line 5, in <module>
    from feast.cli import cli
  File "/usr/local/lib/python3.8/site-packages/feast/cli.py", line 44, in <module>
    from feast.repo_upgrade import RepoUpgrader
  File "/usr/local/lib/python3.8/site-packages/feast/repo_upgrade.py", line 5, in <module>
    from bowler import Query
  File "/usr/local/lib/python3.8/site-packages/bowler/__init__.py", line 13, in <module>
    from .imr import FunctionArgument, FunctionSpec
  File "/usr/local/lib/python3.8/site-packages/bowler/imr.py", line 12, in <module>
    from fissix.fixer_util import LParen, Name
  File "/usr/local/lib/python3.8/site-packages/fissix/fixer_util.py", line 7, in <module>
    from .pygram import python_symbols as syms
  File "/usr/local/lib/python3.8/site-packages/fissix/pygram.py", line 30, in <module>
    python_grammar = driver.load_packaged_grammar("fissix", _GRAMMAR_FILE)
  File "/usr/local/lib/python3.8/site-packages/fissix/pgen2/driver.py", line 153, in load_packaged_grammar
    return load_grammar(grammar_source)
  File "/usr/local/lib/python3.8/site-packages/fissix/__init__.py", line 36, in load_grammar
    gp = _generate_pickle_name(gt) if gp is None else gp
  File "/usr/local/lib/python3.8/site-packages/fissix/__init__.py", line 28, in _generate_pickle_name
    CACHE_DIR.mkdir(parents=True, exist_ok=True)
  File "/usr/local/lib/python3.8/pathlib.py", line 1292, in mkdir
    self.parent.mkdir(parents=True, exist_ok=True)
  File "/usr/local/lib/python3.8/pathlib.py", line 1292, in mkdir
    self.parent.mkdir(parents=True, exist_ok=True)
  File "/usr/local/lib/python3.8/pathlib.py", line 1288, in mkdir
    self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: '/.cache'

Steps to reproduce

Against an OpenShift cluster -

$ helm install feast-feature-server feast-charts/feast-feature-server --set feature_store_yaml_base64=$(base64 -i feature_store.yaml)

Specifications

Possible Solution

The issue is that the OpenShift restricted SCC, which is configured against namespaces by default, forces containers to run with a random uid, and a root gid. The solution is to pre-create the /.cache dir with the proper permissions during the image build.

tokoko commented 6 months ago

That is probably the simplest fix now, but i think we should think about switching all images to run as non-root users anyway, both as a general good practice and also to avoid any other headaches with OpenShift.