JuliaLang / Pkg.jl

Pkg - Package manager for the Julia programming language
https://pkgdocs.julialang.org
Other
610 stars 251 forks source link

Feature Request: Option to Exclude Non-Essential Directories on Package Installation #3798

Open singularitti opened 4 months ago

singularitti commented 4 months ago

I am reaching out to propose an enhancement aimed at optimizing the Julia package installation process, potentially benefiting both users and package authors by addressing disk space efficiency.

Background

In my experience with several Julia packages, I've noticed that directories such as test, docs, examples, and notebooks often contain large files that are not directly utilized in most projects, such as images and binary data. These files, while crucial for development, testing, and documentation, significantly increase the disk space requirement for package installations.

Proposal

I suggest introducing a mechanism within Pkg.jl that allows package authors to define which directories are essential for their package's core functionality. This feature would adjust the installation process to include only these specified directories by default, thereby reducing unnecessary disk space usage.

For instance, the src directory could be considered essential by default, with authors having the option to include additional directories (e.g., deps, datadeps) as necessary for their package's operation. Non-essential directories like docs, examples, and test would not be included in the default installation, but could still be made available for users who wish to dev the package or explicitly opt-in to download them.

Examples

Packages such as GR.jl and ColorSchemes.jl include substantial non-source files within their docs, examples, or test directories. While important for development purposes, these files may not be needed by all users, particularly those focused on using the packages' functionalities rather than modifying or extending them.

grw35h35u

Benefits

KristofferC commented 4 months ago

One issue with this is that we allow downloading the package both from the PkgServer but also from GitHub. The GitHub download is just getting the repo as is so this would then get a different content hash than we would get from downloading from the PkgServer. In theory we could ignore the content hash when downloading from GitHub but that leaves us at the mercy if GitHub is sending us bad files we will blindly accept that.

singularitti commented 1 month ago

Is it possible to add something like a .vscodeignore for VSCode extensions when publishing a release of a package? I mean, the release does not include some folders intentionally?

KristofferC commented 1 month ago

I think in theory it would be possible. add url#version would give you a different content hash than add pkg@version then but maybe that is not the end of the world. @staticfloat and @fredrikekre might have some opinions.

fredrikekre commented 1 month ago

My .julia is right now:

So even if you completely eliminated packages it would only be 13% of the storage. We are already doing a "good job" since we don't need the full repos but just the latest tree. In addition, committing large files to git is generally not recommended since it slow things down anyway? The biggest offenders in the screenshot above looks like binary data and/or docs which could be generated on demand or stored elsewhere.

I don't think it is worth implementing something like a .pkgignore file while the situation looks like above.