bazel-contrib / bazel-lib

Common useful functions for writing BUILD files and Starlark macros/rules
Apache License 2.0
131 stars 87 forks source link

[Bug]: copy_directory copies .DS_Store leading to remote cache misses #887

Open calebmer opened 1 month ago

calebmer commented 1 month ago

What happened?

.DS_Store is a file sometimes added to a directory on MacOS by the operating system (it has to do with file system indexing I think). Sometimes these .DS_Store sneak themselves into npm package repositories (generated by rules_js) then copied with this action here:

https://github.com/aspect-build/rules_js/blob/d0ff155c73e3c7fee5d72485e00775bca1fde10a/npm/private/npm_package_store.bzl#L222-L232

I’m some debugging remote cache misses on MacOS and when I look at the execution log, I’m seeing .DS_Store appearing in the diff. Example from one of the execution log diffs I generated following “Debugging Remote Cache Hits for Remote Execution”:

...

@@ -2229270,14 +2229002,6 @@
     hash_function_name: "SHA-256"
   }
   is_tool: true
-}
-inputs {
-  path: "external/npm__phosphor-react__1.4.1__-440667795/package/.DS_Store"
-  digest {
-    hash: "14dcaaf35f98cb101936646924d50e812d07e260ef2ccea177c1d268e1561200"
-    size_bytes: 6148
-    hash_function_name: "SHA-256"
-  }
 }
 inputs {
   path: "external/npm__phosphor-react__1.4.1__-440667795/package/LICENSE"
@@ -2262980,14 +2262704,6 @@
 cacheable: true
 mnemonic: "CopyDirectory"
 actual_outputs {
-  path: "bazel-out/darwin_arm64-fastbuild/bin/node_modules/.aspect_rules_js/phosphor-react@1.4.1_-440667795/node_modules/phosphor-react/.DS_Store"
-  digest {
-    hash: "14dcaaf35f98cb101936646924d50e812d07e260ef2ccea177c1d268e1561200"
-    size_bytes: 6148
-    hash_function_name: "SHA-256"
-  }
-}
-actual_outputs {
   path: "bazel-out/darwin_arm64-fastbuild/bin/node_modules/.aspect_rules_js/phosphor-react@1.4.1_-440667795/node_modules/phosphor-react/LICENSE"
   digest {
     hash: "bd618db104d07526fe78a9c28dbc2cc4c4286d756aa8b428e51b82de0f8d6aeb"

...

I know this is copy_directory related since when I go to the log file this is from and look for the command I see this snippet:

...

---------------------------------------------------------

command_args: "external/copy_directory_darwin_arm64/copy_directory"
command_args: "external/npm__phosphor-react__1.4.1__-440667795/package"
command_args: "bazel-out/darwin_arm64-fastbuild/bin/node_modules/.aspect_rules_js/phosphor-react@1.4.1_-440667795/node_modules/phosphor-react"
platform {
}
inputs {
  path: "external/copy_directory_darwin_arm64/copy_directory"
  digest {
    hash: "eddc1d4e6a5142106850e1466e4cd384a343ca8be04579dc0add52e5d4f63ac7"
    size_bytes: 1507746
    hash_function_name: "SHA-256"
  }
  is_tool: true
}
inputs {
  path: "external/npm__phosphor-react__1.4.1__-440667795/package/.DS_Store"
  digest {
    hash: "14dcaaf35f98cb101936646924d50e812d07e260ef2ccea177c1d268e1561200"
    size_bytes: 6148
    hash_function_name: "SHA-256"
  }
}
inputs {
  path: "external/npm__phosphor-react__1.4.1__-440667795/package/LICENSE"
  digest {
    hash: "bd618db104d07526fe78a9c28dbc2cc4c4286d756aa8b428e51b82de0f8d6aeb"
    size_bytes: 1092
    hash_function_name: "SHA-256"
  }
}

...

How could I configure copy_directory() to ignore .DS_Store files if they’re present to prevent remote caching misses? Ideally I think copy_directory() should probably ignore .DS_Store files automatically since it’s very unlikely they’re contributing to the build.

I haven’t proven this is the incompatibility between the two MacOS machines which is causing the remote cache miss but figured it may be worth addressing whether or not it’s the incompatibility causing my remote caching misses.

Version

Development (host) and target OS/architectures: MacOS arm64

Output of bazel --version:

Bazelisk version: v1.20.0
Build label: 7.2.0
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Mon Jun 10 13:04:55 2024 (1718024695)
Build timestamp: 1718024695
Build timestamp as int: 1718024695

Version of the Aspect rules, or other relevant rules from your WORKSPACE or MODULE.bazel file:

Language(s) and/or frameworks involved: Node.js, JavaScript, pnpm

gregmagolan commented 1 month ago

Interesting one. It does look like that .DS_Store file snuck into that the external repository directory external/npm__phosphor-react__1.4.1__-440667795/package/ as it is not found in the package archive: https://unpkg.com/browse/phosphor-react@1.4.1/

I could see some blanket ignores in the copy_directory rule to avoid copying files such as .DS_Store that are automatically injected by the OS... In the latest rules_js this should be less of an issue since most npm packages should be extracted directly into the output tree from the download tar unless they have a patch applied or have a lifecycle hook.

calebmer commented 1 month ago

In this case I am patching phosphor-react. (I think the -440667795 in the repository name is a patch hash?)

matthewjh commented 1 month ago

Interesting find. I have noticed more than cache misses onMacOS than expected, but did not yet drill into why myself.

"Ideally I think copy_directory() should probably ignore .DS_Store files automatically since it’s very unlikely they’re contributing to the build."

I do agree with this.