bazelbuild / starlark

Starlark Language
Apache License 2.0
2.44k stars 160 forks source link

Expose File.is_directory #6

Open brandjon opened 6 years ago

brandjon commented 6 years ago

File.is_directory has been around for a while but is undocumented (and therefore unsupported). Yet directories (internally, TreeArtifacts) have increasing support in the Starlark action API. In particular, if they are passed to ctx.actions.args objects they are now (pending an upcoming release) automatically expanded to their contents.

Allowing users to easily inspect whether a File is in fact a directory makes sense, given that they are expected to behave differently in such situations. (At the same time, most code shouldn't care whether something's a directory or not.)

brandjon commented 6 years ago

I believe this change is small enough that it can bypass the doc review process, if there aren't objections.

laurentlb commented 6 years ago

This looks reasonable. Are there alternatives to consider? For example, what if we had a separate Directory type?

brandjon commented 6 years ago

Notwithstanding my fixed-width formatting of the term above, "TreeArtifact" isn't actually a separate class, it's represented by Artifact. So a separate type might be a little harder to implement due to the annotations.

But annotations aside, the fact that it's the same class reflects a design decision that TreeArtifacts are just a kind of Artifact. It might complicate documentation to take everywhere we allow File and say "File or Directory". The flip side is we sometimes have to say "must be a File that is not a directory".

+@tomlu for more domain knowledge.

tomlu commented 6 years ago

What we are trying to do is say that certain File objects are actually multiple files. "Directory" isn't quite true from blaze's perspective, it's more like "this File is a set of Files that happen to be under this path". But "directory", or "recursive ls of a directory", is pretty close.

Regardless, if we have ctx.actions.declare_directory, then is_directory is pretty much mandated. We can't change one term without changing the other. I vote that we expose it and feel good about it.

brandjon commented 6 years ago

Well, naming aside, whatever it is, it seems* ingrained in Bazel that it's to be treated as the same category of thing as an Artifact.

* (Edit: in my humble, limited exposure to action-y internals of Bazel)

tomlu commented 6 years ago

That is totally true. I think directory is fine. The only way the abstraction would seriously break down is if other Files were allowed to overlap with any given directory, which I think we should not permit.

ulfjack commented 6 years ago

When you have an Artifact, you can't tell whether that refers to a file or directory during analysis, and it is incorrect to stat the file without declaring a Skyframe dependency on it, or you will get incorrect incremental builds. Ergo, right now you cannot provide an API for rules to tell at analysis time whether something is a file or directory (you can tell whether it's a directory artifact, but you can't tell whether a plain artifact is actually a directory).

There is a related current incremental correctness bug, which is that we don't run actions that depend on directories. E.g., cc_test(data = ["my_directory"])

This will rerun the test only some of the time (we take the mtime of the directory as a proxy, which changes if the contents of the directory change, but not if the files change). There's also a concern with directory support in combination with package_path. If you have an overlay workspace then you may not see the right files.

On Thu, Sep 13, 2018 at 12:47 AM tomlu notifications@github.com wrote:

That is totally true. I think directory is fine. The only way the abstraction would seriously break down is if other Files were allowed to overlap with any given directory, which I think we should not permit.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/starlark/issues/6#issuecomment-420823649, or mute the thread https://github.com/notifications/unsubscribe-auth/AHA9YatzJh0Rm9JhWNYOZETIhsv0OKxAks5uaY8SgaJpZM4WlymW .

brandjon commented 6 years ago

Thanks for the info Ulf but I'm a little unclear. I think you're saying these three cases exist:

  1. regular Artifact that points to a regular file
  2. regular Artifact that points to a directory (not its contents)
  3. TreeArtifact, which stands for all files underneath a directory

And that at analysis time, we can distinguish 3 from the 1 and 2, but we can't distinguish 1 and 2. Is this correct?

If that's the case, then it sounds like we'd want to rename is_directory. Which is unfortunate, because we already use the "directory" <--> TreeArtifact nomenclature in the Args API.

Alternatively, we could keep the "directory" terminology, but consistently use it to mean only TreeArtifact, and document that this term does not apply to regular Artifacts that happen to represent dirs.

ulfjack commented 6 years ago

On Thu, Sep 13, 2018 at 3:38 PM Jon Brandvein notifications@github.com wrote:

Thanks for the info Ulf but I'm a little unclear. I think you're saying these three cases exist:

  1. regular Artifact that points to a regular file
  2. regular Artifact that points to a directory (not its contents)
  3. TreeArtifact, which stands for all files underneath a directory

And that at analysis time, we can distinguish 3 from the 1 and 2, but we can't distinguish 1 and 2. Is this correct?

That's correct. I'm in favor of a cleanup that prevents 2, which would make it consistent (artifact == file, treeartifact == directory).

If that's the case, then it sounds like we'd want to rename is_directory. Which is unfortunate, because we already use the "directory" <--> TreeArtifact nomenclature https://docs.bazel.build/versions/master/skylark/lib/Args.html#add_all.expand_directories in the Args API.

Alternatively, we could keep the "directory" terminology, but consistently use it to mean only TreeArtifact, and document that this term does not apply to regular Artifacts that happen to represent dirs.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/starlark/issues/6#issuecomment-421009957, or mute the thread https://github.com/notifications/unsubscribe-auth/AHA9YYLXoTQAfbpbmX2z3xlA9926ORwmks5ual_OgaJpZM4WlymW .

ulfjack commented 6 years ago

Cleanup requires that we can statically distinguish file from directory artifacts. My proposal is that we require a trailing '/' character for targets in BUILD files referring to directories, and add some code that checks for consistency with the actual file system state (which may run during loading, analysis, or even post-analysis).

On Thu, Sep 13, 2018 at 5:02 PM Ulf Adams notifications@github.com wrote:

On Thu, Sep 13, 2018 at 3:38 PM Jon Brandvein notifications@github.com wrote:

Thanks for the info Ulf but I'm a little unclear. I think you're saying these three cases exist:

  1. regular Artifact that points to a regular file
  2. regular Artifact that points to a directory (not its contents)
  3. TreeArtifact, which stands for all files underneath a directory

And that at analysis time, we can distinguish 3 from the 1 and 2, but we can't distinguish 1 and 2. Is this correct?

That's correct. I'm in favor of a cleanup that prevents 2, which would make it consistent (artifact == file, treeartifact == directory).

If that's the case, then it sounds like we'd want to rename is_directory. Which is unfortunate, because we already use the "directory" <--> TreeArtifact nomenclature < https://docs.bazel.build/versions/master/skylark/lib/Args.html#add_all.expand_directories

in the Args API.

Alternatively, we could keep the "directory" terminology, but consistently use it to mean only TreeArtifact, and document that this term does not apply to regular Artifacts that happen to represent dirs.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/bazelbuild/starlark/issues/6#issuecomment-421009957 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AHA9YYLXoTQAfbpbmX2z3xlA9926ORwmks5ual_OgaJpZM4WlymW

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/starlark/issues/6#issuecomment-421038706, or mute the thread https://github.com/notifications/unsubscribe-auth/AHA9YZryKgCoT9jmTWBiAP_gBFwpDegMks5uanNxgaJpZM4WlymW .

brandjon commented 6 years ago

And if it has a trailing '/' character, it would be represented as a TreeArtifact?

We just discussed this in the Starlark meeting. The consensus is that we agree in principle to expose is_directory, and we're ok with using the "directory" terminology for TreeArtifacts so long as we document any conceptual overlaps with normal-Artifact directories. Any cleanup for normal Artifacts would be separate and not block this.

We also discussed having the field Field.is_directory vs having a separate Directory type in Starlark, and opted for the former. Rationale: the API is mostly the same, so it's less complicated to not have a separate type.

brandjon commented 6 years ago

We'll leave this issue open a week for potential objections. No design doc because it's a relatively small change. Then we can expose the field and document the subtle issue re TreeArtifact directories vs Artifact directories.

mboes commented 5 years ago

We'll leave this issue open a week for potential objections.

A bit more than a week later... did anything become of this function? Is it now documented and official?

alandonovan commented 3 years ago

@brandjon ping