JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.09k stars 5.43k forks source link

Shrinking Base and Introducing a Standard Library #5155

Closed tknopp closed 6 years ago

tknopp commented 10 years ago

In #4898 there is a discussion whether the reduced loading time of the wonderful static compilation work should be an argument to either shrink or expand the Base module.

As it seems to be difficult in general if things can go in to Base or go into a package (i.e. where to draw the line) I would like to propose the following:

Whether these standard modules should be automatically loaded is from my perspective only a minor discussion point. More important is that this behavior can be easily changed (e.g. in juliarc)

One concern might be that one cannot rely on what module has been loaded in the users environment. But this can be solved by simply always explicitly importing the standard libs when developing a package. "using StdLib" could be a shortcut to import all standard libraries.

It might make sense to offload the standard modules into packages to make the development flow easier. One could then pull in the standard lib when building Julia.

I have not a concrete proposal for standard modules but "LinAlg" and "Signal" (like scipy.signal) come to my mind.

gitfoxi commented 10 years ago

To paraphrase myself from #4898, this seems like a case of premature optimization. No need to ape patterns from languages developed in the covered-wagon era of computing. If anything, look in the other direction. Assume a super-grid cloud cluster of overclocked n-way super-scalar VLIW stacked-die SMP GPUs.

Performance is quite a separate issue from namespace pollution. On that topic, more thought is required -- or possibly less. Is anyone complaining about the namespace yet? Might as well leave it alone until someone does.

JeffBezanson commented 10 years ago

@gitfoxi makes a good point --- being raised on OOP, we all reflexively see more modularity as a good thing, but we have to be clear what exactly we are trying to solve or obtain. It would be nice to have a "minimal julia" perhaps for constrained environments. But I think we can get these sorts of things without changing the default user experience.

see also #1906, #3128.

tknopp commented 10 years ago

I should have been clearer of the purpose of this proposal. Actually the main point is to have "privileged" packages (#1906) which are shipped with Julia. Then it might be easier to draw a clear line between what is in Base and what is in the Standard library. The "minimal Julia" mode would naturally evolve out of this reorganization but it is not the main driver for my proposal.

Maybe I was just surprised to see things like fft and sparse matrices in Base but no spatial filtering (gauss filter and so on)

If this proposal goes into the wrong direction, feel free to close it.

StefanKarpinski commented 10 years ago

I don't actually think that doing using LinAlg at the top of a file before using linear algebra stuff is so bad. That's explicit and easy enough. My main objection to the way that "object-oriented" language like Python and Java do things is that they make the things you have to import so granular that every time you reach for new functionality you need to add imports. But we're not talking about anything like that. This would be one or two using lines at most.

pao commented 10 years ago

It would be nice to have a "minimal julia" perhaps for constrained environments. But I think we can get these sorts of things without changing the default user experience.

As someone who gets to do embedded work, I think that's the right approach. When I know I'm doing a final deployment of a static codebase to a target system, give me the opportunity to strip that down, but don't make things (much) harder on the people who are actually writing the algorithms. The popularity of Simulink Coder/MATLAB Coder/Embedded Coder come from being able to have a full development environment on the one hand and deploy something much more basic on the other. Being able to do those things in a single language without a translation step is one of my hopes for Julia.

tknopp commented 10 years ago

It seems that I am not good in expressing my thoughts. I was proposing that Base+Standard should have more functionality than currently in Base. And these could be loaded by default. This means that the "default user experience" is improved instead of degraded. And on top of this, for people doing embedding stuff, or simply people that want to program smartphone apps in Julia there would be an easy way to remove stuff that is not needed.

For me, Julia is not the better Matlab but actually one of the most promising alternative to programming in C++. Julia solves the "template issues" of C++ in such a beautiful way that it is hard to get back to C++ for the day work.

gitfoxi commented 10 years ago

If there's one thing we can all agree on, it's that C++ can go straight to hell.

StefanKarpinski commented 10 years ago

It's been a bit surprising but in retrospect makes sense that Julia is a pretty good C++ replacement. Both languages allow you more abstraction than C with similar performance, letting you consciously trade-off higher-level programming styles for efficiency, all the way down to writing C-like code and getting C-like performance (and frequently getting C-like performance without having to give up high-level style). Julia's parametric types and "everything is a template" approach has a very similar effect to C++ templates, but with far less brittleness and hassle. And of course, C++ uses operator overloading rampantly and Julia's operators, being simply generic functions with special syntax, give you operator overloading as it should be. So if the main things you liked about C++ were:

  1. performance
  2. generic programming
  3. operator and method overloading

then Julia is a phenomenal C++ replacement. If, on the other hand, the things you need from C++ extend to manual memory management, pointer arithmetic, and multithreading – which are very legitimate needs for some applications – then Julia is not such a great C++ replacement. But I think there are a lot of C++ programmers for whom the first three are really important and the last three are not.

JeffBezanson commented 9 years ago

Very tentatively added to 0.4. This may not be as high-priority as other stuff though.

timholy commented 9 years ago

If it is done after really good package precompilation, then it should be the second highest priority item for 0.4 (right after Keno's debugger) :smile:

tknopp commented 9 years ago

@JeffBezanson Is there some general general platform (github issue or julia-dev mailing list) to discuss what should be targeted for 0.4?

@timholy Just wanted to add a comment on this but your message was faster :-)

My take: 0.4 has already various things that are "almost done" and if one targets 6 month for developing 0.4 I am not so sure if package precompilation and refactoring is doable. Esspecially as package precompilation can only be done by a limited number of people (Jameson, Keno, Jeff, ...).

In short: I am a big fan of package precompilation and the shrinking base. But please lets not add things to 0.4-projects which "may" happen. Better merge all the open important issues from the last half year, stabilize and release.

timholy commented 9 years ago

@tknopp, look for issues with the 0.4-projects tag. Here's a link: https://github.com/JuliaLang/julia/milestones/0.4-projects

tknopp commented 9 years ago

Thanks Tim. Thats clear. But the question was more about whether there is a central place to discuss what should get the 0.4-projects tag.

ivarne commented 9 years ago

Isn't this issue mainly about moving things out of the Base module and into separate modules that can be imported separately with a using ... statement?

The job here seems to be that these modules must have explicit (and few) dependencies. Some sort of standard way of importing everything might also be good for interactive usage. Whether they are precompiled into the default sysimg, or lazy loaded from code or a dll is a totally different issue.

tknopp commented 9 years ago

@ivarne Yes exactly. Please have a look at my related comment https://github.com/JuliaLang/julia/issues/7926#issuecomment-51897732

Technically it might make even more sense to put the modules into separate packages ("default packages") that are pulled when running make.

timholy commented 9 years ago

Precompilation would be fine, but parse/compile from source code would be a massive step backwards. Some of us already have to wait ~1minute before Julia starts doing anything useful, and this would make it worse.

ivarne commented 9 years ago

This issue is not about making startup slower. It is about changing the structure of Base into less interdependent modules. All the new modules can (and should) be compiled with Julia as they do today.

tknopp commented 9 years ago

Yes although there might be different technical solutions. If we would have package precompilation it would be easier than without. One strategy would be:

Step 1: Move things into independent modules within base. Step 2: Move the modules into individual packages that are automatically pulled during make (i.e. git submodules) Step 3: Use precompiled packages.

One motivation of this proposal is that it hopefully lead to a much larger standard library. Image.jl is one examplary candidate. I can't live without gaussian multidimensional filtering in the Julia "standard library"... More controversial might be to pull ploting and a GUI toolkit into the "standard library" but in my opinion this is would be a great way to move forward.

quinnj commented 9 years ago

I actually have a branch that allows a Make.user variable of NOBIGFLOAT that will exclude all BigFloat functionality. Some thoughts having gone through that process:

StefanKarpinski commented 9 years ago

Moving functionality into packages and precompilation are completely orthogonal. We happen to create a system image before loading any external packages right now, but there's no reason that has to be the case. We can just as easily load Base, load various default packages, and then generate a system image.

StefanKarpinski commented 9 years ago

Opened an issue just for precompilation: https://github.com/JuliaLang/julia/issues/7977.

lucasb-eyer commented 9 years ago

Besides embedded, I'd like to add another use-case for switching off features with dependencies: using Julia as an extension-language in a larger (C/C++) project. Julia is the first language of which I believe it can become a serious alternative to Lua. We're not there yet, though.

tknopp commented 9 years ago

@lucasb-eyer: Jep, this was one of the primary motivation at the point I proposed this. And actually I think that Julia will involve in this way. But it is more a long term goal which will need a joint effort. In the 0.4 phase there are to much other important things to happen so that I would keep this on hold after the 0.4 (or even 0.5) release.

mikewl commented 9 years ago

I was reading through this and #1692 and noticed this was not directly addressed.

Would the packages be able to be updated? I realise that there would probably have to be some branches to deal with breaking previous versions of Julia but I think that one of the strengths of having separate packages is that bug fixes and improvements could be had without needing to update Julia.

jiahao commented 9 years ago

See #10333 (comment) and the discussion that followed.

Summary: Pulling modules out of Base is really hard to do without major breakage, because of the way different versions of the same package export identifiers (particularly types and module names) into the current namespace. The situation is compounded by Julia (currently) disallowing the redefinition of types and the current scoping rules precluding conditional redefinition of existing identifiers with runtime if statements (see JuliaLang/Graphics.jl#1 (comment)).

mikewl commented 9 years ago

Sorry, I was thinking post-removal.

I was reading https://github.com/JuliaLang/julia/issues/5155#issuecomment-51922845 and wondered if doing this would make updating those packages significantly more difficult than if they were precompiled separately. Say if StatsBase was a part of the Standard Library and after installing Julia you wanted a newer version than the one included with the install (assuming one is available).

hayd commented 9 years ago

Could the beginnings of this be reorganising the base directory? atm only some parts have their own directory (grisu, dates, markdown, ). IMO grouping files into subdirectories (maybe numbers, arrays, strings, terminal, ...) would make it easier to understand how julia code is arranged/works and make it clearer what can/should be pulled out into their own modules (or potentially, as some other issue talked about, be hidden behind an import but still in the Base installation).

ViralBShah commented 9 years ago

This has only be done piecemeal - largely based on the personal tastes of people working on those pieces.

tknopp commented 9 years ago

@jiahao: I think we have to distinguish two things here: Removing stuff from base that belongs into a non-default package and removing stuff that belongs into a default package. For the later be still miss the infrastructure to realize this.

That things break it IMHO not really an issue. We are on the road to 1.0 and as long as the breakages happen at defined points (0.3 -> 0.4) thats ok. The problem is that we try to have packages that work on two versions (0.3 and 0.4) with that assumption in mind we will never realize such restructuring as needed to shrink base and introduce a standard library.

In short: This issue is hard, dirty work but if we have solved it everything will become more maintainable.

StefanKarpinski commented 9 years ago

It's somewhat hard to modularize the base code since it ends up having to be defined in a fairly particular order for bootstrapping purposes. I do think we should work more towards that, but it's kind of tedious and somewhat thankless work, since in the end you have the same functionality, just organized differently.

tkelman commented 9 years ago

I do think we should work more towards that, but it's kind of tedious and somewhat thankless work, since in the end you have the same functionality, just organized differently.

The module precompilation work however could result in a major payoff for this kind of reorganization - if modules can be bootstrapped individually and made into separate shared libraries, we could specify the interdependencies in the makefiles and parallelize the bootstrap process. This would improve turnaround time for changes to base for everyone.

ivarne commented 9 years ago

We could also possibly have a make small target that don't precompile the optional modules, saving even more time.

tkelman commented 9 years ago

Quoting @tknopp from https://github.com/JuliaLang/julia/pull/6193#issuecomment-106827239

Thanks @tkelman this is exactly the important point that I also tried to make clear in our SparseVector discussion. I know, the case is a little different and don't want to processed the discussion here, I just want to highlight your sentence and its importance for any future discussion on base inclusion (and also #5155)

The two discussions are related, but I didn't want to cross them too closely, since there are some important differences. The existing FFTW bindings are working fine, and the new implementation does not currently exist as a package. Whereas SparseVectors do exist as a package, but there are bugs and semantic inconsistencies in the current base implementation and indexing of SparseMatrixCSC, which bringing SparseVectors into base would fix. Long term, neither sparse matrices (or vectors) nor FFT's should need to be included in the Julia runtime system image for a compiled standalone hello world script written in Julia. Though some generality and accounting for sparsity will need to make its way in some form into various array operations that would always stay in base (indexing, reductions, broadcast, etc), I don't think FFT's have quite the same requirements - but I could easily be wrong.

ScottPJones commented 9 years ago

If anybody wants my help, I'll put my time where my mouth is to try to make "Julia Lite" a reality... (at least, as soon as we ship first version of our product! :grinning:)

tkelman commented 9 years ago

One action item that needs some research work is determining the best way of bundling "default packages" along with the Windows and OSX installers, and in the generic Linux tarballs. We could put them into share/julia/site/v0.x or something (which appears to already be included in LOAD_PATH by default) and easily build separate "lite" and "full" binaries, but we'd need to experiment a bit with how well putting packages there interacts long-term with Pkg.update() usage, would you end up having separate copies in vs outside of Pkg.dir, etc.

Mx-Glitter commented 9 years ago

I'm largely in favor of such a modularization, if you want a concrete example (IMHO) done right, you can look at the (trigger warning: PHP ahead) Symfony Framework with its Standard Edition: core Symfony is a collection of "components" and core "bundles", i.e. base modules in your example, and Symfony Standard Edition is core Symfony + standard "bundles", i.e. standard modules.

(Difference between components and bundles is that bundles are to be used directly in the framework, it's the equivalent of Julia packages whereas components are standalone libraries).

The decoupling even allow you to use any component separately from a Symfony project, e.g. if you only care for a Yaml parser and not a whole web app framework.

vtjnash commented 8 years ago

with #8745 merged, I figured I should add a little demo here of some relevant functionality that is now available:

$ git diff mmap.jl
diff --git a/base/mmap.jl b/base/mmap.jl
index b1de73d..41048fa 100644
--- a/base/mmap.jl
+++ b/base/mmap.jl
@@ -1,6 +1,7 @@
 # This file is a part of Julia. License is MIT: http://julialang.org/license

-module Mmap
+eval(Base, :(
+module Mmap2

 const PAGESIZE = Int(@unix ? ccall(:jl_getpagesize, Clong, ()) : ccall(:jl_getallocationgranularity, Clong, ()))

@@ -208,3 +209,4 @@ end
 sync!(B::BitArray, flags::Integer=MS_SYNC) = sync!(B.chunks, flags)

 end # module
+)) # quote eval
$ cd base

$ ../usr/bin/julia --output-incremental=yes --output-ji mmap2.ji -J ../usr/lib/julia/sys.dylib mmap.jl 
<nuisance warning clipped>

$ ../julia -q
julia> Base._require_from_serialized(#=node=#1, "./mmap2.ji", #=toplevel=#false)
1-element Array{Any,1}:
 Base.Mmap2

julia> using Base.Mmap2

julia> Mmap2.<tab>
Anonymous            MAP_ANONYMOUS         MS_ASYNC              PAGESIZE              eval                  mmap
F_GETFL              MAP_PRIVATE           MS_INVALIDATE         PROT_READ             gethandle             settings
INVALID_HANDLE_VALUE MAP_SHARED            MS_SYNC               PROT_WRITE            grow!                 sync!
julia> Mmap2.mmap
mmap (generic function with 24 methods)
timholy commented 8 years ago

Real-time surgery---pretty fancy! Can you clarify whether this is significantly different from include("base/mmap.jl")?

vtjnash commented 8 years ago

that example is to include as compile is to require, e.g. they are effectively the same it's just a matter of when the 'surgery' happens

mikewl commented 8 years ago

For the windows and osx installers, couldn't the package path be set via the installer? So default will remain as is or it could be changed and this will then set the environment variable correctly?

Then there will be no disparity between the standard library and any other packages. The standard library just consists of packages deemed appropriate to include in the Julia install. Pkg.update() would then update the standard packages as well as any user included packages.

The generic Linux tarballs I have not used so I can't comment on them.

tknopp commented 8 years ago

I think it would be clever to move the standard library to a real package that can be updated via Pkg.update(). The question is if there is one StdLib package or several smaller packages, which composed are the standard library.

mikewl commented 8 years ago

A built in reexport would be nice for this kind of purpose. Several smaller packages could make up StdLib. Then you could either just have using StdLib for full functionality or pick and choose the appropriate packages as required. using StdLib would then re-export all exported items from the associated packages. So StdLib can then easily be modified to include more packages if so desired.

I would think smaller packages would be more ideal for this kind of purpose. Sparse functionality would not be required by most or FFTs but they could make a good case for being part of the StdLib.

Some bikeshedding would probably have to occur as well over the most appropriate name for this kind of thing.

JeffBezanson commented 8 years ago

I'm just going to start making a little list of pieces that would be good to remove from Base. No commitment yet as to how these should be handled --- some may be packages, some may be something else.

simonbyrne commented 8 years ago

Why would the types and basic ops would need to be defined in Base for GMP and MPFR?

JeffBezanson commented 8 years ago

It's maybe not 100% required, but a lot of code in Base uses BigInts. It's also part of the syntax, needed for parsing numeric literals.

simonbyrne commented 8 years ago

Ah, of course.

One other candidate might be Float16, though this may depend on future GPU functionality where it is actually useful (depending on whether the GPU stuff is in a package or Base).

tkelman commented 8 years ago

It's also part of the syntax, needed for parsing numeric literals.

Does it really have to stay that way though?

JeffBezanson commented 8 years ago

Thought about Float16, but the code for it is so small it might as well stay.

Does it really have to stay that way though?

I think I'd prefer it. It doesn't sit well to be unable to simply write a big integer in a program. Plus, the grisu code already contains a bigint implementation --- even if we can't reuse it, it makes it hard to argue that such functionality is not allowed in Base.

tkelman commented 8 years ago

there are lots of types we don't have literals for, and bigint is a pretty hefty dependency to have in the front end for parsing. we don't allow silent promotion to bigint when doing arithmetic on integers, but we do when parsing integer literals. feels inconsistent.

jakebolewski commented 8 years ago

Pretty much all high-level languages dynamic languages have BigInt support though.