Closed ghost closed 4 years ago
Please allow Pkg.add("directory") with directory being either a) a plain directory without version control
That already works.
Please allow Pkg.add("directory") with directory being either a) a plain directory without version control
That already works.
Tried it: ERROR: Git repository not found at 'path'
Right, sorry, you need Pkg.develop(PackageSpec(path="directory"))
.
Also:
(v1.2) pkg> ?add
...
If a local path is used as an argument to add, the path needs to be a git
repository. The project will then track that git repository just like it
would track a remote repository online.
Right, sorry, you need
Pkg.develop(PackageSpec(path="directory"))
.
Ah, yes, that seems to work.
Yeah, add
with a URL or path needs to be a git repo but you can get the code in whatever way you want and use develop
and point that to a path.
I don't think that support for other version control systems will be added. But as been said, you can already use develop
on a path (which can then be version controlled however desired).
However, that will restrict easy installable package development to those masochist enough to work with Git, seriously damaging the ecosystem and Julia. I for one will release nothing I develop as packages installable with Pkg.add
. There has to exist an alternative to the hell that is working with Git—the worst and most poorly designed piece of software ever released. Pkg.develop
is not one, as it is not an end-user solution.
Git aficionados are the scum of the earth—forcing it upon everyone they run across, everywhere they can.
If you want to work on adding support for mercurial as an opt-in plugin, that would be great.
At a high level, since Julia 1.4, neither installing nor developing Julia packages is in principle tied to git anymore, although the tooling is, of course, much better developed if you are using git.
Since Julia 1.0, a package version is associated with a particular source tree hash, which is content-addressed using git's tree hashing algorithm, but that can be used to hash any source tree, regardless of whether git is used for development or not.
Since Julia 1.4, with the introduction of the Pkg Protocol, it's possible to install packages without using git since packages can be installed using the protocol, which simply serves registries, packages and artifacts as content-addressed tarballs.
From Julia 1.5 onward, using the Pkg Protocol is the default so installing packages using git will be strictly a fallback for unregistered packages that are only available via git repos.
So the bones are there, but someone who cares about this issue needs to drive it and make sure that things actually work for people not using git, otherwise it will never work well. Are you that person, @vomout? Can we count on you to drive this and make sure it's a good experience?
@StefanKarpinski Does this mean that:
Pkg.add
tarballs? The documentation still insists that even local paths have to be Git repositories. $ hg parent --template "{node}\n"
c4c2ab568324e37458f2e41f5eb5b1465719c077
I don't have time to write servers and JSON communications etc. (that a quick glance of https://github.com/JuliaPackaging/PkgServer.jl seems to imply), but I could spend some time writing tarball-generation tools from Mercurial repositories (should be pretty straightforward), or tools for generating directory structures to serve from a static web server (that a quick glance of https://github.com/JuliaLang/Pkg.jl/issues/1377 seems to imply).
If you add a local path it does have to be a git repository, since otherwise how do you know what the git tree hash is? We could compute it from the source tree, but it's hard to know what should and should not be included in the tree. You can, however, dev a path whether it's a git repo or not.
We will not support using Mercurial changeset hashes because git-tree-sha1
is the specific source tree hashing algorithm we use (and a changeset is not a tree anyway, it's more like commit). We may change source tree hashing algorithms in the future, but it will still be for a tree. What will be necessary is the mechanism for acquiring a tarball from a mercurial repo URL. That acquisition code will be factored out in the near future, at which point adding support for other acquisition methods should be straightforward. That same code will be used by the Pkg client itself and by Pkg servers that serve those tarballs. Stay tuned.
So this is for verification of contents instead of identification? So something like find . -type f|xargs cat|shasum
? No problem calculating it from a local tree; the only problem is extra files that shouldn't be there. Why not use something standard like GPG signatures of a tarball then?
If it's not used for verification of contents, then it can be anything; a changeset id. Mercurial changeset ids depend on previous changesets to my knowledge. In Darcs (the most beautiful DVCS) they don't.
Also what is the UUID in Project.toml
then for? Its change surely affects the verification signature.
Basically any hosting provider provides tarball links; even basic hgweb (e.g., https://www.mercurial-scm.org/repo/hg/archive/tip.tar.gz is the link to the current version of Mercurial itself). Also hg archive
will provide a tarball.
“A changeset ID is a 160-bit identifier that uniquely describes a changeset and its position in the change history of a repository, regardless of which machine or repository it's on. This is represented to the user as a 40 digit hexadecimal number. Technically, a changeset ID is a nodeid.“ https://www.mercurial-scm.org/wiki/ChangeSetID
So this is for verification of contents instead of identification?
Both—they are content-addressed: the hash of the contents is the identity.
So something like
find . -type f|xargs cat|shasum
?
That's a very rough idea of it, but you need to canonicalize the ordering, handle weird names, capture (all and only) significant metadata about each file. You can use Pkg.GitTools.tree_hash
to compute the hash of an arbitrary tree. This is an independent implementation that does not require git. Think of it as a tree hashing algorithm with the right properties. That fact that git happens to implement it and use it is a tangential fact.
Why not use something standard like GPG signatures of a tarball then?
That may verify a particular tarball, but the same tree can be turned into a tarball many different ways. You need to specify which aspects of the tree are significant, how to order the contents in the tarball and precisely how to generate the tarball so that equivalent trees produce the same tarball. GPG signatures require a PKI to verify (which we don't want to require). Each signature is different, so if you sign the same tree again, you would get a different result. We don't want any of that. What we want is something just like the way git hashes source tress. So that's what we do.
If it's not used for verification of contents, then it can be anything; a changeset id.
It is used for verification, it cannot be arbitrary.
Mercurial changeset ids depend on previous changesets to my knowledge.
Yes, we used git commits in Pkg1/2. It was a mistake: it means you need to preserve history and clone an entire repository in order to verifiably install a version of a package.
Also what is the UUID in
Project.toml
then for? Its change surely affects the verification signature.
Have you read the documentation at all? This comment doesn't really make sense.
Basically any hosting provider provides tarball links; even basic hgweb (e.g., https://www.mercurial-scm.org/repo/hg/archive/tip.tar.gz is the link to the current version of Mercurial itself). Also hg archive will provide a tarball.
Yes, that works for versions for which such tarballs exist. However, manifest files can include trees for unregistered versions of packages so long as they can be acquired via the version control system. Knowledge of how to figure out that URL also needs to exist somewhere.
You can use
Pkg.GitTools.tree_hash
to compute the hash of an arbitrary tree. This is an independent implementation that does not require git.
So no problem installing local tarballs or (clean) trees. The clean tree issue could easily be fixed by supporting a FileManifest.txt
listing the files that should actually be in there—but local or any tarballs are really enough.
Have you read the documentation at all? This comment doesn't really make sense.
None of the packages (DelimitedFiles and Printf, and as their dependencies Unicode and Mmap) that one of my projects depends on has a git-tree-sha1
(or version
) in the Manifest.toml
, only uuid
. In another project various packages are also lacking it. So apparently the three hash isn't even needed.
Yes, works for versions for which such tarballs exist.Manifest files can include trees for unregistered versions of packages, however, so long as they can be acquired via the version control system. Knowledge of how to figure out that URL also needs to exist somewhere.
You can get a tarball for any changeset: randomly picked from Mercurial's own repository: https://www.mercurial-scm.org/repo/hg/archive/eb9026a84e83.tar.gz . (Same with Heptapod, a Gitlab fork.) The base name is just the short-form (first 12 characters) changeset id.
So all you need is a mapping from the tree hashes (used as a versioning scheme) to URLs, in no way tied to any particular version control system. Basically, to make a release (in the centralised or other) registry, there should be a system to submit the URL of a tarball, generated in any way one wants (automatically by a VCS hosting platform, through hg archive
, or manually). The submission system then calculates the tree hashes from the tarball contents and puts it in the registry with the URL. Mirrors could of course also be supported. For verifying corruption in transit, possibly the tree hash should also be submitted (calculated using provided tools locally from the tarball).
None of the packages (DelimitedFiles and Printf, and as their dependencies Unicode and Mmap) that one of my projects depends on has a
git-tree-sha1
(orversion
) in theManifest.toml
,
That's because they are stdlibs and you cannot choose what version you use—you get whatever Julia ships with.
So all you need is a mapping from the tree hashes (used as a versioning scheme) to URLs, in no way tied to any particular version control system.
Roughly, yes.
For verifying corruption in transit, possibly the tree hash should also be submitted (calculated using provided tools locally from the tarball).
Everything served over the Pkg protocol is content addressed, so if you can ask for it, you already know how to verify it.
For verifying corruption in transit, possibly the tree hash should also be submitted (calculated using provided tools locally from the tarball).
Everything served over the Pkg protocol is content addressed, so if you can ask for it, you already know how to verify it.
This would be to make sure the tree hash in the registry matches what the author intended to submit: that the tarball wasn't corrupted when downloaded by the registry. One can imagine the web server serving the tarball being compromised, and serving dangerous packages. The author submitting the tree hash from the local machine helps to avoid such things.
We already verify content at every stage of the process. The client, the Pkg server, etc. all verify that anything they install has the correct content hash before doing anything with it (we also verify tarball hashes when those are available before trying to unpack tarballs).
Since Julia 1.4, with the introduction of the Pkg Protocol, it's possible to install packages without using git since packages can be installed using the protocol, which simply serves registries, packages and artifacts as content-addressed tarballs.
Is this actually supposed to work and how? I've tried to ] add
both local and online tarballs, but it just tries to clone it as a Git repo.
Or (reading the bit about 1.5), is it only supposed to work for registered packages and unregistered packages are still… eugh… git-only? It should be possible to just install a tarball from an URL instead of forcing heavy (semi-)centralised mechanisms.
I tried the create the following dummy PkgServer that just converts a directory of tarballs into the apparently correct structure, which is then to be rsync
ed to a static web server. But even the whole JULIA_PKG_SERVER=pkg.julialang.org
doesn't seem to do anything in Julia 1.4. It still tries Git. So I didn't get to try if it works at all. (Certainly the diff responses will be wrong, and some things could be improved by automated .htaccess
generation to refer to the original tarballs etc.)
Anyway, it's not the right solution. Requiring users to add random registries is dangerous. For things outside the centralised system (FOSS people really have a fetish for centralised distribution systems… did it long before AppStore and GooglePlay etc.… makes the state-communism comparisons seem valid), it's much better to just be able to install a tarball from a trusted URL. (Pretty practical for trying out the codes for a scientific article straight from Zenodo: Just ]add https://zenodo.org/record/...tarball
!)
#!/usr/bin/env julia
using Pkg
import Pkg.TOML
const decompress = `gzcat`
const copy = `cp`
function process_tarball(targetdir, tarball, dirname)
local hash, uuid
mktempdir() do tmpdir
run(pipeline(`$decompress $tarball`, `tar -C $tmpdir -x`))
hash = bytes2hex(Pkg.GitTools.tree_hash(tmpdir))
chmod(tmpdir, 0o777, recursive=true)
pkginfo = TOML.parsefile(joinpath(tmpdir, dirname, "Project.toml"))
uuid = pkginfo["uuid"]
end
entry = (uuid, hash)
basepath = joinpath(targetdir, "registry", uuid)
targetpath = joinpath(basepath, hash)
mkpath(basepath)
run(`cp $tarball $targetpath`)
return entry
end
tarballdir=ARGS[1]
targetdir=ARGS[2]
web_prefix=""
registries=[]
foreach(readdir(tarballdir)) do tarball
m = match(r"^(.*)\.(tar\.gz|tgz)$", tarball)
if isnothing(m)
warning("Skipping $tarball: does not appear a tarball")
else
println("Processing $tarball")
entry = process_tarball(targetdir, joinpath(tarballdir, tarball), m.captures[1])
push!(registries, entry)
end
end
open(joinpath(targetdir, "registries"), write=true) do io
for (uuid, hash) in sort!(collect(registries))
println(io, "$web_prefix/registry/$uuid/$hash")
end
end
I don't really care for the attitude here. Nothing personal, but I've just hit my limit on putting up with people on the internet who want to complain and/or tell me how we're doing things wrong. I'm happy to work towards supporting other ways to develop and host packages, but I don't think this conversation is worthwhile to continue from my perspective. I hope you can figure out how to get the Pkg protocol thing working—you may want to ask for help on discourse.
Well you are doing things… fundamentally wrong, although not completely wrong. For example, the uuid+tree hash based Pkg Server thing is a good idea as a persistent cache, but it's not for everything. I'm not going to put the one-off codes for my scientific articles in a centralised system, although I might consider putting well-developed libraries that they depend on there (and then again, maybe not, due to my very bad past experiences with the FOSS herd… the licenses I use due to those experiences might even get my code thrown out from the centralised system).
Much better than saying please ]add MyArticle
is to please ]add https://zenodo/…/myarticle.tar.gz
… because you want a specific version (which a tarball is a pointer to), and MyArticle
might not even end up pointing to my stuff; it might point to someone else's stuff. Sure you could tell to install an UUID, but those are cryptic, and … why put random one-off stuff in a distribution system? The centralised system could surely cache that thing, to ensure persistency (which Zenodo also does) but the primary pointers should be more individualistic, not this FOSS herd stuff of a central system.
Being able to add packages via URL to a tarball is certainly a reasonable feature request. You don't really need to insist that we're doing things "fundamentally wrong" and that we are "scum of the earth" in order to request that. This whole interaction hasn't really had the effect of moving that feature higher up on my list of priorities, but I'm sure we'll get to it at some point.
And I don't really care if it's ]add https://tarball
or ]develop https://tarball
, but making it consciously difficult to install something that is not in a byzantine distribution system or git is just… fundamentally wrong.
I refuse to shoot my own feet off by touching Git, but would like to create (local) packages to better manage parts of my code that are generic, and parts that are specific to certain projects (scientific articles). Please allow
Pkg.add("directory")
with directory being either a) a plain directory without version control, or b) a Mercurial repository.If public Mercurial repositories were supported, I could contribute packages; without that, I just publish my work on Zenodo for archival. I will not be a masochist and touch Git.