cue-lang / cue

The home of the CUE language! Validate and define text-based and dynamic configuration
https://cuelang.org
Apache License 2.0
5.03k stars 287 forks source link

cue/load: support loading from io/fs.FS #607

Open cueckoo opened 3 years ago

cueckoo commented 3 years ago

Originally opened by @verdverm in https://github.com/cuelang/cue/issues/607

Is your feature request related to a problem? Please describe.

I would like to embed a module into the overlay. This would be useful for shipping binaries with the schema embedded, especially with go 1.16 supporting file embeds natively.

This currently does not work due to how the overlay and cue.mod/module.cue is handled in cue/load/...

Another use case would be loading all inputs from the Overlay. This is useful in a server context where one wants to avoid disk usage. This seems to not work because of how the args to load.Instances works. It seems one cannot specify files from the Overlay there.

Am I missing a configuration setting?

Describe the solution you'd like

The ability to do the above. Guidance and input on implementation.

Additional context

I've traced the first use case to some TODOs in the code. The first is here: https://github.com/cuelang/cue/blob/master/cue/load/config.go#L498

The processing in this area generates absolute paths. The fileSystem in fs.go https://github.com/cuelang/cue/blob/master/cue/load/fs.go#L206

With Go 1.16, there is a new FS interface, maybe that should be used? If so, maybe that means this feature is delayed until 1.16 is more widely adopted?


Example that is not working as desired or expected:

package main

import (
        "bytes"
        "fmt"
        "os"

        "cuelang.org/go/cue"
        "cuelang.org/go/cue/load"
)

func check(err error) {
        if err != nil {
                fmt.Println(err)
                os.Exit(1)
        }
}

var stdin string = `
package a

hello: "world"

foo: A
`

func main() {
        var b bytes.Buffer
        var r   cue.Runtime

        fmt.Fprintln(&b, stdin)

        addOverlay()
        config.Package = "*"
        config.Stdin = &b
        config.ModuleRoot = "./"
        config.Dir = "./"
        fmt.Println("Overlay:", config.Overlay)

        bis := load.Instances([]string{"-"}, &config)
        for _, bi := range bis {
                check(bi.Err)

                I, err := r.Build(bi)
                check(err)

                v := I.Value()
                fmt.Printf("%!v(MISSING)\n", v)
        }
}

func addOverlay() {
        for fn, src := range files {
                config.Overlay[fn] = load.FromString(src)
        }
}

var config load.Config = load.Config {
        Overlay: map[string]load.Source{},
}

var files = map[string]string{
        "cue.mod/module.cue": `
module: "hof.io/foo/bar"
`,

        "a.cue": `
package a

A: "A"
`,

        "b/b.cue": `
package b

B: "B"
`,
}
cueckoo commented 3 years ago

Original reply by @mpvl in https://github.com/cuelang/cue/issues/607#issuecomment-741038336

I'm not entirely sure I understand what you're asking. But load should read all files to its internal fs which uses Overlay. So that includes module files.

It would be useful to use embedding and fs. But we aim to support at least 2 versions back, so that will take a while. Either way, these are implementation details, and overlays should work for all files. It could be that it requires a directory that is overlayed to exist, but also that could be tweaked. I'm not sure what you desired or expected here, so I can't tell what is wrong easily.

cueckoo commented 3 years ago

Original reply by @verdverm in https://github.com/cuelang/cue/issues/607#issuecomment-741162675

I'm seeing issues with the overlay, I think because of the makeAbs function, though not confirmed yet.

I've added some printing, working on a test file and setup, will push something to gerrit later

Overlay: map[a.cue:
package a

A: "A"
 b/b.cue:
package b

B: "B"
 cue.mod/module.cue:
module: "cuelang.org/foo/bar"
]
config.fs.init - start
overlay: a.cue
overlay: b/b.cue
overlay: cue.mod/module.cue
config.fs.init - initd
config.fs.complete - cue.mod dirname cue.mod
config.fs.complete - cue.mod error stat: stat /home/tony/cue/gerrit/cue/load/testdata/overlay/cue.mod: no such file or directory
module root not defined%!!(MISSING)(EXTRA string=.)
exit status 1

I've tried various settings for config.ModuleRoot and config.Dir and they end in various errors or unexpected output.

cueckoo commented 3 years ago

Original reply by @verdverm in https://github.com/cuelang/cue/issues/607#issuecomment-741291770

I think I found the issue:

  1. https://github.com/cuelang/cue/blob/master/cue/load/config.go#L483
  2. https://github.com/cuelang/cue/blob/master/cue/load/fs.go#L64
  3. https://github.com/cuelang/cue/blob/master/cue/load/fs.go#L122

Now, since almost all fileSystem functions use fs.makeAbs on their first line, there is almost no way to get into the overlay, which should only have non-absolute paths.

...unless the users should be placing / in front of every filename key they add as a Overlay source. If this is the case, we ought to

But, given the comment here: https://github.com/cuelang/cue/blob/master/cue/load/config.go#L263 ... a users would assume that the file would be give as a relative path so that the Overlay replaces the relative file on disk. This is how I read it anyhow


config settings

        config.Package = "a"
        config.Stdin = &b
        config.ModuleRoot = "./"
                                // config.Module = "cuelang.org/foo/bar"
        config.Dir = ""

go run output

Overlay: map[a.cue:
package a

A: "A"
 b/b.cue:
package b

B: "B"
 cue.mod/module.cue:
module: "cuelang.org/foo/bar"
]
config.fs.init - start
fs.init load.Config{Context:(*build.Context)(nil), loader:(*load.loader)(nil), ModuleRoot:"./", Module:"", Package:"a", Dir:"/home/tony/cue/gerrit/cue/load/testdata/overlay", Tags:[]string(nil), AllCUEFiles:false, BuildTags:[]string(nil), releaseTags:[]string{"cue0.1"}, Tests:false, Tools:false, filesMode:false, DataFiles:false, StdRoot:"", ParseFile:(func(string, interface {}) (*ast.File, error))(nil), Overlay:map[string]load.Source{"a.cue":"\npackage a\n\nA: \"A\"\n", "b/b.cue":"\npackage b\n\nB: \"B\"\n", "cue.mod/module.cue":"\nmodule: \"cuelang.org/foo/bar\"\n"}, Stdin:(*bytes.Buffer)( 0xc0002dfce0), fileSystem:load.fileSystem{overlayDirs:map[string]map[string]*load.overlayFile(nil), cwd:""}, loadFunc:(build.LoadFunc)(nil)}
processing "cue.mod/module.cue" from overlay
processing "a.cue" from overlay
processing "b/b.cue" from overlay
config.fs.init - initd
config.fs.complete - cue.mod dirname cue.mod
fs.stat - input cue.mod
fs.makeAbs cwd:(/home/tony/cue/gerrit/cue/load/testdata/overlay) "cue.mod" --(n/a)--> "/home/tony/cue/gerrit/cue/load/testdata/overlay/cue.mod"
fs.stat - abs /home/tony/cue/gerrit/cue/load/testdata/overlay/cue.mod
fs.getOverlay /home/tony/cue/gerrit/cue/load/testdata/overlay/cue.mod
fs.getOverlay - split /home/tony/cue/gerrit/cue/load/testdata/overlay/ cue.mod
fs.getOverlay - not found /home/tony/cue/gerrit/cue/load/testdata/overlay/cue.mod
config.fs.complete - cue.mod error stat: stat /home/tony/cue/gerrit/cue/load/testdata/overlay/cue.mod: no such file or directory
module root not defined "."
exit status 1
cueckoo commented 3 years ago

Original reply by @tonyhb in https://github.com/cuelang/cue/issues/607#issuecomment-782782262

Using Overlay with 1.16's FS to embed my packages in my binary and ran into this. Also notice that there are a few filesystem requests when loading packages even if they're in overlay within a Config.

cuelang.org/go/cue/load/import.go's GenPath will attempt to load data from the filesystem - even if all of the Cue files are within the in-memory overlay buffer.

This breaks a very specific flow: I'm compiling a cue validation binary to WASM, with all cue files in memory via Overaly. However, because import.go always accesses the disk modules break and cannot be used in a webassembly build.

cueckoo commented 3 years ago

Original reply by @verdverm in https://github.com/cuelang/cue/issues/607#issuecomment-782808533

There are other places where Cue uses the local FS. In particular, this was a recent bugfix: https://github.com/cuelang/cue/commit/a6169255715d7fcd71ab70ac53902be22df5f2a0

Cue has a WASM playground, maybe that could help as a reference. The repo for that part is here: https://github.com/cue-sh/playground

cueckoo commented 3 years ago

Original reply by @tonyhb in https://github.com/cuelang/cue/issues/607#issuecomment-782907585

Yes, but in particular using overlay in a wasm environment causes an an error here: https://github.com/cuelang/cue/pull/786/files#diff-67213f39b84a013e7db04664cd3ad943b7a237d97b9c6e862d10806ac27def28R418-R420 :)

There are several places where a filesystem is still expected even after that bugfix. The PR above fixes the ones that I've ran into in the latest beta. It (should) also enable module imports from overlays by fixing areas of the filesystem code that didn't inspect overlays as it probably should.

cueckoo commented 3 years ago

Original reply by @myitcv in https://github.com/cuelang/cue/issues/607#issuecomment-797878959

Building on my comment in https://github.com/cuelang/cue/pull/786#issuecomment-797476388, like @mpvl I'm neither entirely clear what's being asked for, nor do I understand what bug is being reported in this issue. That said, thanks @verdverm and @tonyhb for looking into this, because I think there is a gap here that is nicely filled by io/fs.FS.

The docs for Overlay currently state the following:

    // Overlay provides a mapping of absolute file paths to file contents.
    // If the file with the given path already exists, the parser will use the
    // alternative file contents provided by the map.
    //
    // If the value must be of type string, []byte, io.Reader, or *ast.File.

(I've just mailed https://cue-review.googlesource.com/c/cue/+/9021 to remove that last out-of-date paragraph)

Put another way, an overlay, in its current form, exists in order to complement what is on disk, rather than wholesale replace its contents.

Therefore I'm unclear that the following expectation is correct:

Read from the overlay directory in these circumstances; we shouldn't be hitting the disk.

Indeed it's not possible AFAIU to have an Overlay in its current form either complement what is on disk and act as a substitute for it - we would need another "switch" to say what mode we want.

But as I mentioned I think such a "switch" is unnecessary - we can instead add an io/fs.FS-typed field to cue/load.Config: if provided, we walk that file system, otherwise we behave as we do today.

We should as, part of this change, decide what to do with Overlay. Because in truth with an io/fs.FS, Overlay becomes redundant. As such we should look to make a breaking change pre v1.0.0 to remove it when we can.

We could safely make this change today, guarded by build tags - go1.16 and !go1.16 - and then simply drop the !go1.16 code once we no longer support < go1.16. The overhead of maintaining "dual" versions of cue/load.Config and some associated code would be time-bounded, and would be cleanly achieved by having separate files for the two.

Thoughts on that as a way forward?

If so, we would like help implementing this. The only gentle request being that we make this change via Gerrit (because discussion about code, review etc is much easier and more productive).

ElliotSwart commented 2 years ago

I was unable to find any docs in this repo for official go version support, but if the CI/CD testing matrix reflects reality it looks like 1.16.x versions are now the earliest supported, which would allow this change to be made without the added complexity of supporting earlier filesystem APIs https://github.com/cue-lang/cue/blob/28f8d476e3057eb5f8c5001f7e2af6f8e0894690/.github/workflows/test.yml#L35-L37

I am currently in the initial stages of building a cue based utility that needs to embed the go library, which required me to delve a little to deep into the cue/load package and was forced to look for workarounds even for the most initial proof of concept. When looking for examples in production, I stumbled upon the following in the grafana codebase where some serious concerns were expressed about the current loading behavior, which would likely be fixed by the suggestions above. https://github.com/grafana/grafana/blob/6baf38e84b78ca958529e3a3b79a9773097565e4/pkg/schema/load/generic.go#L22-L39

I’ve submitted a potential update via Gerrit: https://review.gerrithub.io/c/cue-lang/cue/+/533307. I tested on both MacOS and Windows.

ElliotSwart commented 2 years ago

@myitcv, would the pull request above represents a reasonable first step per your comment? Use of the Overlay is so deeply embedded in tests throughout the project, so removing it didn't seem advisable for a first effort (if only to preserve the assurances of the original tests). It does allow use of embed.FS which is a requirement of my project and would allow the use of an in memory filesystem. OverlayFS, which was my refactoring of much of the existing code, could be trivially repurposed for this use; however there is likely a simpler implementation if backward compatibility does not need to be preserved.

verdverm commented 2 years ago

Does this work for output? Such that an import or export would write to the fs.FS?

ElliotSwart commented 2 years ago

@verdverm The io/fs.FS interface is read only which prevents using it for export. However from what I can tell (though I'm relatively new to the codebase) virtualizing the writes is much less critical. Most cue library functions output go values/pointers (rather than the names of created files) which means that virtualized file output isn't core to the internal workings of cue / the ability to embed it as a library. File output generally occurs closer to the edge of the cue libraries.

In general, reading is where a lot of cue's complexity (and potential security holes) are because of import resolution. As part of loading a cue module, your entire filesystem hierarchy can be traversed. This isn't really a problem when using it from the cli, where the executing users permissions fully describe the security context you are operating in. For tools where you want to load tenant specific cue files from a single service, you want to be able to control what files get read in, regardless of any other factors. Prior to this patch, os file reads occur in internal functions, often as a fallback for when appropriate arguments are not provided. This can lead to unpredictable / unsecure behavior.

From a usability standpoint, allowing other filesystems gives the library user the freedom to use the full cue module loading rules (rather than needing to re-implement it manually) while giving significantly more flexibility as to how those files are provided.

verdverm commented 2 years ago

Does this break current usage?

Module support will likely mean packages from a User's homedir will need to be accessible. This would be similar to how Go has a shared module cache typically outside of the current directory. Would it be better to only use fs.FS when it is specified? Otherwise use the current implementation, rather than using OSFS when neither is specified?

ElliotSwart commented 2 years ago

@verdverm So for this to be a good fix (and not require two largely parallel implementations), the fs.FS interface needs to be used everywhere since a lot of the reads are in internal functions and not just at the edge. However it is certainly possible to prevent regressions in these default file systems, as any function in the interface can be overridden.

Both OverlayFS and OSFS are meant to preserve existing behavior, however it is possible there are tests only in unity that would expose incompatibilities. Cue loader's current support for symlinks is rather unclear, they are mostly mentioned in dead code. The implementation of directory walking in io/fs claims to only support symlinks at the root of the walk. However, the only difference I can find currently is that the current code uses ioutil.ReadDir instead of the default io/fs ReadDir. Regardless we should probably determine a test case or two to prevent regression if symlink traversal is important. If we can agree what constitutes a regression, happy to ensure we don't have one!

As for breaking API changes, I didn't remove any existing tests or make any real changes to them (other than having them use safer (non-depreciated) methods when appropriate. So while I'm confident I didn't remove any tested public APIs, there may have been an untested / internally unused one I accidentally removed. Could you let me know which function you saw removed?

I think I was unclear about module support, I mean simply the ability to resolve imports relative to a cue.mod file, such that a library developer can write an independently testable cue program and runnable cue program, and load it from a go program from a constrained directory.

verdverm commented 2 years ago

I think I was confused about the breaking API change (reading on mobile)

For module support, I'm referring to the future plans and how we should think about this might impact that, as part of any changes now.

Here's a reproducer for the symlink issue

exec mkdir -p zelda/cue.mod/pkg/link.io
exec tree
cd zelda/cue.mod/pkg/link.io
exec ln -s ../../../../link/ ./link
cd ../../../
exec tree
exec cue eval
cmp stdout ../golden.stdout

-- link/cue.mod/module.cue --
module: "link.io/link"

-- link/repro.cue --
package link

Link: "a past"

-- zelda/cue.mod/module.cue --
module: "zelda.io/zelda"

-- zelda/repro.cue --
package zelda

import "link.io/link"

Zelda: "back to \(link.Link)"

-- golden.stdout --
Zelda: "back to a past"
ElliotSwart commented 2 years ago

Thanks so much! I'll fix that up.

Really all the changes here are just what is necessary to handle all import / decoder reads from a virtual file system. I don't think it will make much of a difference in how loading of user installed modules would be handled. Likely you would want to give the library user the option (but not the requirement) to specify different filesystems for the user modules and the code they are trying to execute; even though OSFS could handle both for the command line.

verdverm commented 2 years ago

@ElliotSwart another question which came up but I haven't given much thought to...

It looks like a number of functions were introduced / deprecated only in name to support alternative behavior despite the API not changing

yassinm commented 2 years ago

@ElliotSwart i was hoping for someone to take this colossal work :-) Thank you so much for this !

How do you want to cooperate on this ? I can try to test if this will work for my particular use case and provide feedback. How do you prefer to receive feedback in gerrit or in here ? My personal preference is github :-)

@myitcv can this PR (assuming it checks all the requirements ) potentially merge after 1.18 is released ? My assumption is that it cannot be before that ?

ElliotSwart commented 2 years ago

@yassinm Currently there are comments being added to the gerrit link above, with lots of housekeeping stuff since I'm new to the codebase, gerrit, etc. Happy to let you know once we get to the later stages.

I'm not sure the standard procedure for how people contribute tests, but I love tests so certainly wouldn't want to discourage you!

ElliotSwart commented 2 years ago

@verdverm @yassinm just uploaded the next patch set. @verdverm, your test script now works (at least on my machine). https://review.gerrithub.io/c/cue-lang/cue/+/533307

kajogo777 commented 1 year ago

@ElliotSwart great work! I've been wanting to use this as well, How can I help to make more progress on this?