feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.62k stars 1k forks source link

feat: Force absolute paths in file based registries #4774

Open boliri opened 1 day ago

boliri commented 1 day ago

What this PR does / why we need it:

While I was working on https://github.com/feast-dev/feast/pull/4772, I discovered an issue related to the Feast CLI and how FileRegistryStore instances are populated when running a CLI command using the -f argument. Assume a really simple setup made with the feast init command, i.e. the Feature Store is backed by file stores both in the registry and in the online store.

When the registry is set to file in the feature_store.yaml, RegistryConfig's path attribute is filled with the same exact value provided in the YAML, and at some point, the Registry is instantiated. This is not the same for the FileRegistryStore that is instantiated later, however - in that class, paths from the RegistryConfig are processed to turn them into absolute paths in the event they are relative:

class FileRegistryStore(RegistryStore):
    def __init__(self, registry_config: RegistryConfig, repo_path: Path):
        registry_path = Path(registry_config.path)
        if registry_path.is_absolute():
            self._filepath = registry_path
        else:
            self._filepath = repo_path.joinpath(registry_path)

    [...]

The problem here is that the tool used to build the CLI injects the current working directory in its context's CHDIR key, and that CHDIR is passed down the successive calls as the repo_path until the FileRegistryStore is finally built, which results in a malformed absolute path. Consequently, things like the UI cannot load (the landing page just claims there was an issue while loading the project list due to a malformed feature_store.yaml, but that's actually misleading).

Besides these issues, there's another one related to the env var FEATURE_STORE_YAML_BASE64. When the YAML is base-64 encoded and passed to the CLI with a file-based registry configured using a relative path, the final path built in FileRegistryStore takes as prefix a subfolder of /tmp, hence assuming that the file registry is stored in that subfolder (which is impossible, as it is created on the fly just to store the decoded env var as an actual file somewhere). Once again, the FileRegistryStore ends up in a weird state.

To mitigate these issues, I propose enforcing the usage of absolute paths while configuring the feature_store.yaml, be it a file or an env var. The onus is on the user then, and I am aware this can be a breaking change for people currently using file-based registries on their Feature Store setups, but the odds of using registries like these in production environments are very low IMHO.

Aside from that constraint, I also updated the logic behind the feast init command to replace the default values of the feature_store.yaml template with absolute paths built according to the dir from which the command was issued.

Which issue(s) this PR fixes:

Misc

boliri commented 1 day ago

Just noticed some unit tests broke after these changes, will take a look at them when I have some time.

dmartinol commented 1 day ago

Hi @boliri, I think this change goes against a recent update to use relative file paths in the generated template #4624.
This has been designed to improve code portability, which absolute paths seem to be against.

Wrt using encoded YAML in the env var, I understand the severity of the issue that you mention and we can think of enforcing the use of absolute paths there. In any case, we expect to have a limited number of setups where the env var is requested to the user. In fact, with the advent of the new Feast Go Operator #4561, the variable will be entirely managed by the Kubernetes controller which will create the correct absolute path for you. Helm chart users will be instead requested to encode a YAML file with the correct absolute path(s).

boliri commented 23 hours ago

Hi @dmartinol! Thanks for the quick reply.

Didn't think about portability to be honest, but it definitely makes sense.

About the env var holding encoded YAMLs, if you don't foresee a high number of setups using it then I think we can safely drop this PR.

Let me know if you want me to do that, or feel free to close it. Thanks again!