JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.4k stars 5.46k forks source link

Shrinking Base and Introducing a Standard Library #5155

Closed tknopp closed 6 years ago

tknopp commented 10 years ago

In #4898 there is a discussion whether the reduced loading time of the wonderful static compilation work should be an argument to either shrink or expand the Base module.

As it seems to be difficult in general if things can go in to Base or go into a package (i.e. where to draw the line) I would like to propose the following:

Whether these standard modules should be automatically loaded is from my perspective only a minor discussion point. More important is that this behavior can be easily changed (e.g. in juliarc)

One concern might be that one cannot rely on what module has been loaded in the users environment. But this can be solved by simply always explicitly importing the standard libs when developing a package. "using StdLib" could be a shortcut to import all standard libraries.

It might make sense to offload the standard modules into packages to make the development flow easier. One could then pull in the standard lib when building Julia.

I have not a concrete proposal for standard modules but "LinAlg" and "Signal" (like scipy.signal) come to my mind.

johnmyleswhite commented 8 years ago

I agree with Tony: the fact that adding a single digit to a literal can change its types seems a little weird -- especially in a language for which enforcing consistent types is so much important than in other dynamic languages. It's potentially a surprising source of type variability if doing something like the following changes the type of x:

julia> x = 111111111111111111111
111111111111111111111

julia> x = round(Int, sin(x))
1
jakebolewski commented 8 years ago

@johnmyleswhite are you advocating that all non floating point literals are parsed as Int? Otherwise you are also giving up Int128 support for signed literals. Unsigned literals are lexed as into the smallest bounding literal type.

I guess what I'm saying is the problem you raise is not limited to BitInt

johnmyleswhite commented 8 years ago

What I would advocate for, but won't argue hard for if no one else agrees, is that you only get Int128 or BigInt if you use a suffix.

jakebolewski commented 8 years ago

If we are able to shrink Base down to only requiring MPFR and GMP that would be amazing.

I'm with Jeff, I don't see this as such a big problem. We should have a better way of disabling these dependencies if people don't want to rely on them (no GPL, reject programs during lexing) but I don't think that should be the default.

jakebolewski commented 8 years ago

The larger question here is how to do this without massive breakage. Moving a couple of the dependencies out of Base did not go well last time and it is hard to do without breaking anything. I think this may be a rip the bandaid off situation but we might not have much leverage to do this going forward.

JeffBezanson commented 8 years ago

BigInt literals don't require GMP. We already have a bigint implementation in julia in the grisu code, and it doesn't require a crazy amount of code. Ideally, most of BigInt and its functions would be defined in julia, and we'd only (optionally) call GMP for better performance and advanced algorithms.

A couple items on the list are obvious candidates for packages, or for inclusion in existing packages. For the others, I would move them out of the base directory but keep them in this repo. Then we set things up so that you can load them with using, and somehow make it the default at the REPL to include all of them.

tkelman commented 8 years ago

I think #11638 was a good start in naming and organization. The obvious GPL libraries, SuiteSparse, FFTW, and Rmath will need to be moved out and handled via BinDeps. Package bundling also needs to be experimented with to maintain a full-featured "computing environment" distribution on top of a smaller mandatory core - some of this needs restructuring work within Pkg. That work needs to be done no matter what and can start before doing anything in base. The rest of the code reorganization should probably be done in place to start with, to first allow optionally disabling large chunks of Base and see what happens.

At least with hex literals what you get is easily predictable from the length of what you type, with signed literals it's more difficult to parse at a glance. I'd rather eventually have it be explicitly annotated and be a library feature, not a deeply entrenched part of core parsing in the "language spec." It's not something that needs to be dealt with immediately, but long-term by 1.0 it's worth thinking carefully about. Do we really want the length of integer literals to determine whether code can be deployed to an embedded environment where GMP doesn't run? It's not a very portable library to get built in exotic environments, and making it more optional and replaceable with alternative bigint libraries (or none at all!) for different use cases would be a good thing IMO. GMP has been a definite hurdle to doing standalone deployment of Haskell binaries, as a closely related example here.

Another entry to the list, heavily depended on but worth isolating in its own module, is Regex/PCRE.

mikewl commented 8 years ago

For automatically loading certain packages, would a repl init script which runs some Julia code automatically not work? The nice part would be then that it could be edited by the user after installation to include other packages that user would commonly use.

JeffBezanson commented 8 years ago

My thinking is that the only reasonable interpretation of a long digit string is a BigInt, and the only reasonable syntax for a BigInt is a long digit string. I also suspect that most uses of BigInts don't require GMP. We can get parsing, conversions, comparisons, and arithmetic with small routines in Base written in julia.

GunnarFarneback commented 8 years ago

Is Rational needed in Base?

StefanKarpinski commented 8 years ago

Rational is useful for expression constant coefficients in a way that allows generic programming.

simonbyrne commented 8 years ago

2025 probably needs to be addressed before we can pursue this en masse. For example, a library may provide a special implementation for a BigFloat, but be able to operate without them (where would besselj0(::BigFloat) be defined in the above layout?).

timholy commented 8 years ago

Also, we can't currently precompile methods that span modules, so breaking things up (a goal I support overall) will have some downsides in terms of loading/JITting time.

KristofferC commented 8 years ago

Is this still planned for 0.5.0? It seems that it would require a lot of work that might be more useful at other places?

wildart commented 8 years ago

I think that package management system (Pkg) is a good candidate. We already cut the development related part from it. I see no reason to stop at this point. Considering that Pkg introduced additional dependencies to the main project (libgit2, libcurl, and possible candidate - libmbedtls), I think it would be beneficial to spin it as separate project. It will give more room to work on new features (namesapces, GUI), and the project itself wouldn't be constraint by Julia release plans or other limitations, e.g. licensing issues.

StefanKarpinski commented 8 years ago

If the package manager is a package, how do you install it?

StefanKarpinski commented 8 years ago

Although I definitely agree that it makes sense to be able to strip it out easily.

wildart commented 8 years ago

If the package manager is a package, how do you install it?

Install it as a system package, but separately. In case if PM is not installed, give an instruction how to install it or provide information on setup of a location from where packages can be loaded.

tkelman commented 8 years ago

Install it as a system package, but separately.

And for systems that don't have package managers? Technically separating out Pkg should be feasible, and building Julia without it may be something a small handful of people want to do for embedding or other use cases, but there's a lot of work to do in figuring out exactly how this would be distributed alongside Julia for the majority default configuration that will need it.

tknopp commented 8 years ago

And for systems that don't have package managers? Technically separating out Pkg should be feasible, and building Julia without it may be something a small handful of people want to do for embedding or other use cases, but there's a lot of work to do in figuring out exactly how this would be distributed alongside Julia for the majority default configuration that will need it.

That is actually the point of this issue: Building the infrastructure such that some "defaultPackages" file is parsed when building Julia. It could then automatically pull the default packages.

Once such an infrastructure is available one could think about moving stuff into packages. IMHO Pkgdoes not seems to be the most important package to split of.

I want to note that one has to distinguish between whats included in the julia repo and what is distributed when downloading julia. Till now this is the same but once we have some infrastructure for default packages this will differ.

wildart commented 8 years ago

Building the infrastructure such that some "defaultPackages" file is parsed when building Julia. It could then automatically pull the default packages.

Agree, it does not relate to ordinary user experience. Some parts of julia are already pulled from git repositories during build. It shouldn't be a problem to copy a fresh copy of the repository into a particular directory after the build process is done.

timholy commented 8 years ago

I've thought about moving Profile out, and am happy to do the work esp. if it motivates others to begin on their pieces, but Profile requires a few profile-specific ccalls (platform-specific signal handling). Before we begin migration, it would be nice to have a strategy for providing pre-built binaries not just of (trimmed-down) Julia but also "common" packages.

https://github.com/JuliaLang/julia/issues/5155#issuecomment-106843678

tkelman commented 8 years ago

Note that the migrated JuliaBox is now demonstrating some of the ways in which we're pre-bundling packages along with Julia. There have been some precompilation issues but we're working on fixing those.

tknopp commented 8 years ago

Question: Would it be required that the bundling happens during distribution of julia? What about doing this at the first execution of julia: I.e. ask the user: "do you want the minimal version or should we install some batteries?" Alternatively the minimal version could have a hint in the banner, what to execute so install the full version.

vchuravy commented 8 years ago

Also an important side note is that is would be nice to export the git history if possible to the package as well.

@andreasnoack and I have been discussing this for the sparse code.

simonbyrne commented 8 years ago

See https://github.com/JuliaMath/Primes.jl/pull/12 for how it was done with Primes.jl

tkelman commented 8 years ago

Yes that's a good idea for code migrations, I did that with filter-branch for https://github.com/JuliaPackaging/Git.jl.

vchuravy commented 8 years ago

For sparse I am using https://gist.github.com/vchuravy/81de73d34ed2b24de3123f50bb3c5bbd with the results at https://github.com/vchuravy/ExperimentalSparse4

tlnagy commented 8 years ago

Before we begin migration, it would be nice to have a strategy for providing pre-built binaries not just of (trimmed-down) Julia but also "common" packages.

I'm also worried about moving too many things out of Base before an effective "batteries-included" version of Julia is developed. Already I'm having to import a large number of packages before I'm able to get working, one of the things I liked least about python.

Would it be required that the bundling happens during distribution of julia? What about doing this at the first execution of julia

The latter would be much preferable. Having one meta-package that loads up base technical libraries at run time: Combinatorics, Primes, FFTW, QuadGK, DSP etc would be ideal. The user wouldn't see this after installing it (i.e. no using TechnicalJulia every time we fire up the repo).

JeffBezanson commented 7 years ago

Next items that should be easy to move: SharedArrays and Poll (file watching).

JeffBezanson commented 6 years ago

Closing in favor of #18795 and more specific issues for pieces to remove.