Michael-F-Bryan / include_dir

The logical evolution of the include_str macro for embedding a directory tree into your binary.
https://michael-f-bryan.github.io/include_dir
MIT License
319 stars 36 forks source link

Include only files matching a glob #81

Open ajeetdsouza opened 2 years ago

ajeetdsouza commented 2 years ago

It would be great if we could include just the files matching a particular glob. This is similar to https://github.com/Michael-F-Bryan/include_dir/issues/13, but I think a glob design would be far easier to use.

Michael-F-Bryan commented 2 years ago

Do you have an idea of what this might look like from a consumer's point of view?

ajeetdsouza commented 2 years ago

I'd think that ideally, it would be inspired by the gitignore syntax.

include_dir!(
  "$CARGO_MANIFEST_DIR/assets/*", # include all files in assets folder
  "!$CARGO_MANIFEST_DIR/assets/.cache" # except .cache folder
  "$CARGO_MANIFEST_DIR/images/*.png" # include PNG images from here
);
ModProg commented 2 years ago

I have a similar requirement, I actually just need to include files matching a glob as a Vec of Strings.

demurgos commented 1 year ago

Do you have an idea of what this might look like from a consumer's point of view?

After having encountered this problem of picking a subtree from a directory multiple times, there are two things I'd recommend for a good API.

  1. The directory acting as the root must be explicit and unambiguous. You need to know 100% where is the boundary.
  2. The filters should be defined relative to the inclusion root. This allows better composability and symmetry with .find. For example, it allows easy merging / virtual overlay FS setups.

Let's take an example. The code below prints all the Rust files in the current project:

static PROJECT_DIR: Dir = include_dir!("$CARGO_MANIFEST_DIR");

for entry in PROJECT_DIR.find("**/*.rs").unwrap() {
    println!("Found {}", entry.path().display());
}

This is a prime candidate to apply filtering at inclusion time. Following the points above, I'd recommend the following API with an optional filter / match / find / whatever argument:

static PROJECT_DIR_ONLY_RUST: Dir = include_dir!("$CARGO_MANIFEST_DIR", filter = "**/*.rs");

// This `find` is only a simple traversal of what's embedded, no filtering occurs at runtime
for entry in PROJECT_DIR_ONLY_RUST.find("**/*").unwrap() {
    println!("Found {}", entry.path().display()); // Same output as the previous program
}

Notice how the code above is close to the original. You can simply think of it as moving the runtime glob matching to compile time. Further extensions may allow an array as the list of filters, but for the moment I'd recommend to start simply with a single pattern and make sure it works fine.

Adding support for a list of multiple patterns requires a bit of care. In general I agree that the best semantics should be to follow how .gitignore matches files: you can mix positive and negative filters in any order and they act as set operators on the list of files. There are some patterns that can't be expressed if you use two lists (one positive, one negative) instead of a singled mixed list of both positive and negative rules.