Pkg3: package namespaces

StefanKarpinski commented 8 years ago

The current proposal doesn't mention namespaces anywhere. This issue is to discuss how namespaces could fit into the proposal and Julia's own namespacing in general.

StefanKarpinski commented 8 years ago

Some open issues:

Namespaces could be one-to-one with registries or they could be an orthogonal feature.
Are namespaces reflected in Julia? I.e. to use JuliaOpt/JuMP do you write using JuliaOpt.JuMP or can you just write using JuMP? If the latter, is there a namespace path?

ararslan commented 8 years ago

My thoughts on this, for however many cents ≤2 they're worth:

Namespaces could be one-to-one with registries, which in turn could work like Homebrew taps. Say there are two registries, JuliaVegetables and JuliaGreens, that each have a package called Broccoli. If a user has opted into the JuliaVegetables registry and not JuliaGreens, they can do using Broccoli and it would implicitly mean using JuliaVegetables.Broccoli. But if the user has opted into both registries, Broccoli would be ambiguous, which would trigger an error. This would be a nice convenience for users, but would be bad for reproducibility.

A safe option may be to require the fully qualified Registry.Package for everything that isn't in the "cathedral" registry (currently METADATA, if I understand that part of the proposal correctly). That would be a non-breaking change, and would also alleviate the problem of registries with identical package names.

jkroso commented 8 years ago

Isn't the package metadata a good place to disambiguate between registries?

So instead of:

[library.libXYZ]
uuid = "994d35e9-862f-42c9-aa51-d40fef54ab41"
versions = "2.3-2.5"

You would write:

[libXYZ](github.com/library/libXYZ)
uuid = "994d35e9-862f-42c9-aa51-d40fef54ab41"
versions = "2.3-2.5"

bicycle1885 commented 8 years ago

I believe namespaces are important, so the next-generation packaging system should include namespacing mechanism. It will make it easy for developers/organizers to create "biotopes" to cultivate special-interested packages there. A real-world example is Bioconductor, which includes thousands of R packages, but it has a dedicated package manager because its release schedule is different from R. I think this should not happen in Julia.

Namespaces could be one-to-one with registries or they could be an orthogonal feature.

One-to-one relation with registries sounds reasonable. I want the BioJulia registry and it would be enough.

Are namespaces reflected in Julia? I.e. to use JuliaOpt/JuMP do you write using JuliaOpt.JuMP or can you just write using JuMP? If the latter, is there a namespace path?

Always requiring namespace prefixes looks a little bit verbose to me, but may be consistent.

StefanKarpinski commented 8 years ago

A reason not to make namespaces and registries one-to-one is that we want to allow people to use their GitHub usernames (for example) as namespaces and still publish their packages. But maybe that's too fine-grained a level for namespacing and that use case should instead be handled by improved support for installing unregistered packages.

tbreloff commented 8 years ago

Can we have "implicit registries" based on a github account ("tbreloff" or "JuliaML") or alternate directory/url? I suppose it depends what you require a registry defines, but at a minimum there should be an automated way to initialize a registry for an existing account/directory/url using the repos that exist there.

I think it's fine to be one-to-one as long as registries are dirt-simple to create and manage. Some related ideas: https://github.com/tbreloff/MetaPkg.jl/issues/5

JeffreySarnoff commented 8 years ago

The package manager must offer clean and tidy management behind corporate walls. There may exist contractual constraints or turf-protective limits on interdepartmental access. Distinct versions of a package may not permit the same package manager activities. Package access, use, testing, upgrading, may be subject to legal obligations and the existence of those obligations may be confidential. Permissions and restrictions that may apply to upgrading for direct use may not apply to upgrading that same package happens indirectly in support of another project.

The ability to tell systems "keep [these packages] just as they are" and "keep [these packages] up to date" and "[these packages] should be kept in sync with the latest stable [patch] to [version]" helps. Namespaces improve that help. Checkpoints augment the helpfulness of the help.

As I understand it, Julia's Registrating, Namespacing and Checkpointing are mutually informing and jointly supporting. The availability of one simplifies some parts of the others. They should co-exist.

samoconnor commented 7 years ago

I've recently been working on generating AWS Julia API packages from the Amazon JSON service description files.

My work-in-progress output results in 170 kloc across 102 service interface packages: AmazonAPIGateway.jl ... through ... AmazonXRay.jl. See: generated documentation, and generated code

Currently there are no types, no input/output validation or translation, just one-line wrapper functions and docstrings for each service operation (i.e. the size of the packages will probably grow).

I'm mentioning this here as a potential use-case for package namespaces. (Including all these modules in a single AWSSDK.jl package is impractical, the package would be huge, the pre-compile time would be long, and the vast majority of users would use at most ~10% of the modules.)

Will Pkg3 namespaces allow all the 100+ Amazon*.jl packages to be neatly grouped together? It would be good to have an easy was to install all of them for dev systems where the bloat is not a problem, but to allow lean dependancy driven installation for space constrained deployment environments (cluster nodes, embedded systems, AWSLambda, etc).

quinnj commented 7 years ago

Great thoughts @samoconnor; I've had similar thoughts recently w/ HTTP.jl. Currently, it has code for URI parsing, handling cookies, client-side requests, a server, etc. all built in under the same package. I could split these "subpackages" out into stand-alone packages, but it's actually kind of nice having them all under one roof from a maintenance & development standpoint. It'd be great if they (the submodules/packages) could be registered and used individually, while using HTTP.jl would bring them all in.

StefanKarpinski commented 7 years ago

Do you guys have any thoughts on how this would look and work? I.e. how would one use a namespaced package and what would it mean? Are you imagining using HTTP/Server or using HTTP.Server? Would this essentially just allow levels of hierarchy in package names? Or would the behavior be something more than that? If there's a package named HTTP/Server do you always use it as HTTP.Server or can it sometimes be accessed as just Server as in using HTTP; using Server?

JeffreySarnoff commented 7 years ago

If there is a package named HTTP/Server imo it should only be used as HTTP.Server because if it were accessed as Server that opens up the potential for conflict with e.g. Task/Server.

StefanKarpinski commented 7 years ago

Ok, it seems like this may just be a layer of hierarchy in package names. This usage suggests that the Julia-side syntax probably be HTTP.Server since HTTP/Server already has a meaning, although I kind of like writing using HTTP/Server instead of HTTP.Server so that the / indicates that it's a hierarchical package name, not a submodule – although it should probably be exposed as a submodule on the Julia side, so 🤷‍♂️ . Would it be too weird to write using HTTP/Server but then access the package module as HTTP.Server or just Server? Would the HTTP module be made available after doing using HTTP/Server as well? There could also be more levels of hierarchy, e.g. using Net.HTTP.Server. Would that be possible to abbreviate as using HTTP.Server?

quinnj commented 7 years ago

Here's a few of my thoughts:

I don't have a strong preference on using HTTP/Server vs. using HTTP.Server
I do think it's important to allow using HTTP.Server to only load the Server code. A better example would be HTTP.URIs, where there are several packages that only need HTTP.URIs module functionality, and don't want to have to depend on all of HTTP
In reality, I'd be fine registering HTTP.URIs as it's own package, including all the required package files in the HTTP/src/URIs/ directory.

So just clarify, I would imagine my package directory looking like:

HTTP/
          URIs/
                    REQUIRE
                    src/URIs.jl
                    appveyor.yml
                    etc...
          Server/
                    REQUIRE
                    src/Server.jl
                    appveyor.yml
                    etc....

So it'd be a way to have my package live inside the directory of another package, so if someone depends on HTTP, they get the whole kitchen sink, but if a package only needs URIs code, they just depend on HTTP.URIs (or it can just be called URIs) and just get that code (w/ maybe the HTTP directory stubbed out at least).

JeffreySarnoff commented 7 years ago

imo

using HTTP/Server # namespace
using Package.Module # submodule 
import IndexedTables.Columns # subselect within module

Server.serve(uri) # do not need HTTP.Server, although it could be used 
Module.func() # do not need Package.Module, although it could be used

and I agree with the "just get what you need" thought above

JeffBezanson commented 7 years ago

I agree with @quinnj ; one feature I definitely want is to have multiple packages in one repo, and have them be separate modules (not submodules). That is already a kind of namespace mechanism; is it enough?

If namespace = repo, adding a package from a separate repo to a namespace can be done by adding a git submodule to the namespace repo.

If that's not workable, then namespaces could just be a naming convention, e.g. some package names have HTTP/ as a prefix, regardless of where they're hosted. Fancy stuff like having an HTTP module that includes all HTTP packages could be provided by a macro (or equivalent built-in feature). That way nobody has to maintain a central place listing all the HTTP packages. The macro would expand to e.g.

module HTTP
using HTTP/Server as Server
using HTTP/URIs as URIs
...

samoconnor commented 7 years ago

.. access the package module as HTTP.Server or just Server?

I'd stick with using the fully qualified name for using.

I prefer HTTP/Server because HTTP.Server is confusing if Server is not a sub-module of HTTP.

I don't see any reason for grouped packages to be sub-modules. In fact a package group may not be a module at all. e.g. You might have AWS/Core and AWS/S3, but AWS is just a prefix, there is noAWS module.

It might be nice to have a shortcut like using AWS/*.

My initial problem was not wanting to pollute the package directory list at https://pkg.julialang.org with 100+ Amazon*.jl packages. From that point of view, simply allowing / in the package name as a naming convention would be enough.

I would definitely like to have multiple packages in a single git repo. But this would have to be implemented such that Pkg.add("AWS/S3") does not cause the entire AWS repo to be downloaded.

samoconnor commented 7 years ago

WRT HTTP/*...

I think the hierarchy should be structured primarily for the convenience of the end user rather than the package developer. We don't want to renameHTTP/URI to NET/URI when the package maintainer of the NET package takes over development.

StefanKarpinski commented 7 years ago

I don't see any reason for grouped packages to be sub-modules.

So how do you want to access the module brought in by using HTTP/Server? If you want to be able to write HTTP.Server then HTTP is going to be a module and Server a submodule of it.

samoconnor commented 7 years ago

I guess the options might be:

using HTTP/Server
Server.listen(...)

using HTTP/Server
HTTP/Server.listen(...)

using HTTP/Server as HTTPServer
HTTPServer.listen(...)

Maybe it's up to the package author to decide if the unqualified name is likely to make sense in the global namespace? e.g. using Amazon/DynamoDB and DynamoDB.put(...) is probably unambiguous.

In some cases the conflicting unqualified name might be a good thing:

using OpenSSL/Digest
Digest.sha1(...)

using MbedTLS/Digest
WARNING: redefining module Digest

amitmurthy commented 7 years ago

I agree with @quinnj ; one feature I definitely want is to have multiple packages in one repo, and have them be separate modules (not submodules).

+1. Also related to whatever mechanism we come up with for supporting a Julia "standard library" that ships with Julia but is on a separate release cycle. The Julia stdlib can be separate modules under the same repo?

The Python world had this PEP - https://www.python.org/dev/peps/pep-0413/#abstract - I do think it is the right approach though it it was finally unimplemented. Because they got a decent package manager which was expected to speed up development/management of alternative implementations. I don't think that addressed the issues raised in the above PEP, which I believe will be a good approach for Julia.

StefanKarpinski commented 7 years ago

@samoconnor: of those the only viable option is Server.listen and I think people are going to want to be able to access that in a qualified form, which means HTTP.Server.listen – which in turn means that HTTP is a module and that Server is a submodule of it. Both HTTP.Server and just Server can work, however, as long as Server is unambiguous, so the caller can choose whether they want to write HTTP.Server.listen or just Server.listen or just listen. The example with conflicting Digest imports would not work like that – instead, it would drop both names and OpenSSL.Digest and MbedTLS.Digest would be the only ways to refer to those values.

StefanKarpinski commented 7 years ago

+1. Also related to whatever mechanism we come up with for supporting a Julia "standard library" that ships with Julia but is on a separate release cycle. The Julia stdlib can be separate modules under the same repo?

That's a good idea. I realized at some point that when package versions are associated with git tree objects (or more generally, source trees hashed with various secure hashes in the same manner as git uses SHA1 to hash source trees), then you can quite easily tag multiple different packages from the same git repo. Using that for Julia development makes a fair bit of sense and would allow us to continue checking out, updating and testing Julia and its standard packages all together.

jdlangs commented 7 years ago

For a long time I've wanted to see Julia drop the one-repo-one-module assumption which I think will be key for scalability of large projects. I like @quinnj's design with separate REQUIREs for each module in the package, though I think it would be fine if the package registry only knows about HTTP.

@StefanKarpinski, does the form using HTTP/Server; HTTP.Server.listen() necessarily imply that HTTP is a module? Could it not be a new Package object which just acts a simple container to access one or more actual modules? My understanding of the internals is limited, but I wouldn't think that would cause too much disruption with things like precompilation.

StefanKarpinski commented 7 years ago

does the form using HTTP/Server; HTTP.Server.listen() necessarily imply that HTTP is a module? Could it not be a new Package object which just acts a simple container to access one or more actual modules?

A module is just a container of a bunch of bindings. How would such a proposed Package object be different than that? (The name Package is not great since it's not a package – it's a namespace of packages.) Would Package be limited to only having bindings for packages? What's the point of that limitation? Why not just create a module object that only happens to contain bindings to other modules provided by packages? I just don't see the point of introducing a new fundamental type into the language which is not only non-orthogonal, its entire functionality is a subset of another kind of thing we already have.

The real point I was getting at is that if you write using HTTP/Server it seems somewhat confusing to then access that as HTTP.Server. In that case, it seems better to write using HTTP.Server and then access it using the same syntax. The only downside I can see is that it's not syntactically evident that the HTTP part is a package namespace rather than something else. But no one has made a convincing case whether that does or doesn't matter.

jdlangs commented 7 years ago

I think there's only one truly important feature needed, as stated by @quinnj upthread:

it's important to allow using HTTP.Server to only load the Server code

if Server is a submodule of HTTP, doesn't that mean that using HTTP... will load and precompile everything inside it? If not and you can easily make it just load the code in the Server submodule, then I agree adding a Package or Namespace object doesn't add anything.

StefanKarpinski commented 7 years ago

doesn't that mean that using HTTP... will load and precompile everything inside it?

Not necessarily – eval can introduce new bindings into modules at any time.

jdlangs commented 7 years ago

In that case, you may want a syntactic distinction between "create an empty HTTP module and insert this other module Server into it" and "load the HTTP module and also make the symbols of its submodule Server visible."

JeffreySarnoff commented 7 years ago

 # a package that has subpackages .. subpkgs are as if imported without the short forms
 using HTTP # give me HTTP's exports, access to someval only as HTTP.subpkg.someval    

 # a subpackage that is useful absent the superpackage
 using HTTP/URIs  # give me URIs' exports, do not load other HTTP stuff

ma-laforge commented 6 years ago

Bump. Were multi-module packages ever implemented? I cannot find a reference to this in the latest documentation.

@StefanKarpinski: Do we actually need package namespaces in Julia proper? Would it not be sufficient for the package manager to understand that a single package can include multiple modules?

Enhancing `Project.toml`?

Would it not be reasonable to append subdirectory information to a UUID in the [deps] section? I believe this is a simple way to decouple the notion of modules and git repositories:

[deps]
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"/PrintfSubdir
REPL = "de0858da-6303-5e67-8744-51eddeeeb8d7"/REPLSubdir
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

I suppose another advantage of this solution is the possibility to define "virtual packages" that collect individual modules from different packages in order to simplify how we build development environments.

For example: an organization could define a single "machine learning" package (MLBase) that pulls together a bunch of modules needed by most ML developpers. Every "machine learning" application would then only have to add this MLBase virtual package to get up and running.

...But I admit I am new to pkg3 and am not quite certain how it behaves yet. I am not even certain if this solution is introducing redundant concepts.

StefanKarpinski commented 6 years ago

I don't think that was ever really the concept behind package namespaces. The idea was more to allow more than one package with the same name to be distinguished based on the namespace they come from so that e.g. you could do import Optimization/AMD and get the AMD package that implements approximate minimum degree ordering and do import CPUs/AMD and get a package that provides utilities for working with AMD CPUs.

The design of Pkg3 already allows those two different packages with the same name to coexist. Given that, the need for namespaces is considerably lessened. Namespaces would still be useful for a few things, however:

Allowing pkg> add Optimization/AMD (for example) to add the AMD package that's about optimization instead of having to pick the AMD package with the right UUID.
Allowing both Optimization/AMD and CPUs/AMD to be used in the same project. Currently only one package with a given name can be used in the same project.

Of course, if a project uses both Optimization/AMD and CPUs/AMD in the same project then they either need to only be used in different namespaces where there's no collision for the name AMD, or they'd need to get different names by some import Optimization/AMD as OptAMD construct, or they'd need to be declared as OptAMD = "<uuid of Optimization/AMD>" in the project file.

ma-laforge commented 6 years ago

I see. I guess I was side-tracked by @samoconnor 's issue description where he would like to distribute multi-module packages in a single Git repo to simplify his "Amazon" solution.

I have similar issues where half a dozen modules need to be developed simultaneously in order for the solution to be functional. Having independent modules in a single Git repository simplifies the development & the distribution of a coherent (working) solution as an atomic event.

If implemented carefully, it could also provide a reasonable solution for conditional modules (through layering instead of conditional imports):

Project: InspectDR (plotting tool)
- Module: InspectDR (Core tool - requires only Cairo)
- Module: InspectDRGtk (Optional layer - requires Gtk)

A multi-module solution could therefore allow InspectDR to work on JuliaBox - where Gtk is not currently supported.

Questions

Are multi-module repositories supported with Pkg3?
Are there plans to support them (did not see Issues describing this).
Should I open an issue? (I would work on this myself, but I really don't have a good grasp of how pkg3 works yet - hopefully that will change soon.)

StefanKarpinski commented 6 years ago

Pkg3 supports multi-package repositories, i.e. a single repo containing multiple packages as subtrees of the main repo tree. This is already done in JuliaLang's own stdlib directory, each of which is a separate Pkg3 package. If you want multiple modules from a single package, make them submodules of the package's main module.

JuliaLang / Juleps