Reorganizing Dao modules!

daokoder commented 10 years ago

As I have mentioned in another thread, I have been considering to reorganize the modules in the following way:

Dao(fossil)/dao(git) will only include core modules which provide important functionalities, but no user-accessible types and methods (namely they are not used in coding);
DaoModules/dao-modules will only include standard modules without external dependency. Namely, they can only use standard C and system libraries;
DaoTools/dao-tools will only include standard tools without external dependency.

This means only the following modules will stay with Dao: auxlib (it will be changed to include only auxiliary C interface functions), debugger, help (it does offer methods for accessing the helps, this is the only exception I am going to make), macro (maybe) and profiler. The other modules currently with Dao will be moved to DaoModules. The following modules will be moved out and become individual project/repo probably: cblas, clinker, DaoCXX, DaoJIT.

The modules with external dependency will be managed by a standard packaging tool. The reason for this change is that, currently it is really inconvenient to deliver some useful modules and tools to users because of dependency issues.

This packaging tool will be able to handle the dependency issues of each module, and can download, configure and build the dependent libraries and the modules. There will be an archive from which the dependent libraries and the modules can be downloaded.

The basic components of this packaging tool are already available in the standard modules (os.fs, web.http, zip and pkgtools). The use of these components is shown in https://github.com/daokoder/dao-tools/blob/master/daopkg/daopkg.dao, where daopkg is intended as the packaging tool.

Anyone like to volunteer for developing this packaging tool daopkg?

Night-walker commented 10 years ago

DaoModules/dao-modules will only include standard modules without external dependency. Namely, they can only use standard C and system libraries;

You didn't mention zip and web.http, so I suppose including third-party library source into the module doesn't count as external dependency?

Anyone like to volunteer for developing this packaging tool daopkg?

You do know that 'anyone' basically means me? :) I am actually glad to have an opportunity to write something in Dao, so I'm in :) Packages, however, require more then just source files. A description is needed which would minimally specify the dependencies and some auxiliary information.

daokoder commented 10 years ago

You didn't mention zip and web.http, so I suppose including third-party library source into the module doesn't count as external dependency?

Yes, their source is small, and they have to be included for the packaging tool, which must not have external dependency.

You do know that 'anyone' basically means me? :) I am actually glad to have an opportunity to write something in Dao, so I'm in :)

Right, though I also had @dumblob in mind, but he is not very active recently, so it is basically you:). Really great that you take it so promptly.

Packages, however, require more then just source files. A description is needed which would minimally specify the dependencies and some auxiliary information.

Yes, we may need a module description format for this. This is where I stopped after adding web.http, zip and pkgtools etc. Or we can do as how homebrew does it for mac, where a base class is provided for all packages, and each package is managed by a script that extends the base class.

dumblob commented 10 years ago

Hi guys, let me apologize for another week without any activity. I've moved (net yet permanently, but one never knows :)) two weeks ago to a foreign country and I have so much work to do (you might have noticed that I was active mainly at the weekends, but not throughout the workweeks).

Anyway, this issue caught my attention because I've done quite a lot of packaging in past. I'd recommend not to reinvent a wheel and do things as simple as possible to allow other feature-full tools specifically tailored for packaging (OS/distribution -specific) to just parse/grab the description and automatically transform it to their formats without user intervention (imagine fpm). I'd start with looking at the lists "Package source types" and "Target package types" at https://github.com/jordansissel/fpm/wiki .

Tomorrow, I'll try to do again something for Dao (I'm missing it so much :(), but now I have to manage some other, slightly more urgent, stuff :(.

daokoder commented 10 years ago

Hi guys, let me apologize for another week without any activity. I've moved (net yet permanently, but one never knows :)) two weeks ago to a foreign country and I have so much work to do (you might have noticed that I was active mainly at the weekends, but not throughout the workweeks).

Really no need to apologize, it is quite understandable that you, and in fact everyone of us, have other things to tend. And there is no obligation or anything like that. You guys have already contributed so much, I really appreciate that.

I'd recommend not to reinvent a wheel and do things as simple as possible to allow other feature-full tools specifically tailored for packaging (OS/distribution -specific) to just parse/grab the description and automatically transform it to their formats without user intervention (imagine fpm). I'd start with looking at the lists "Package source types" and "Target package types" at https://github.com/jordansissel/fpm/wiki .

I agree we should do things as simple as possible. But regarding reinventing a wheel, it really depends on the situation. I remember I was once (or maybe a few times) suggested not to develop a new VM and use Parrot VM (the VM for Perl 6) instead. If I had done so, this project is probably already dead.

We already have reinvented a new make tool, which works quite well. Without it, managing the current modules, supporting single file deployment, supporting compiling to Javascript would have been very difficult and boring tasks. So sometimes you need to do things your own way (that might be reinventing a wheel) in order to make things easier in the long run.

For the packaging tool I have mentioned, it is preferable to be simple and has no external dependency other than Dao and the standard modules (of course, other small dependencies that can be solved by including the source may also be acceptable. fpm has way too much dependency). The parts for handling files, directories, network, compression, decompression and archiving are already ready, so it would not be reinventing a wheel from scratch.

From what I understood, @Night-walker wants to do it in Dao. That's really great:). (And that's how I intended to it myself eventually if he and no one else volunteers.)

Night-walker commented 10 years ago

There is a couple of things regarding the hypothetical package manager that require clarification.

Package index. All the packages should be registered somehow, as otherwise how the tool is supposed to work with modules in different, independently updated repositories?
Package content. What content and in what form may a package contain? Package format, possible included files, their interpretation?
Installation. What steps the tool is expected to do during installation of a package? What about building, external dependencies, scripts?

dumblob commented 10 years ago

For the packaging tool I have mentioned, it is preferable to be simple and has no external dependency other than Dao and the standard modules (of course, other small dependencies that can be solved by including the source may also be acceptable. fpm has way too much dependency). The parts for handling files, directories, network, compression, decompression and archiving are already ready, so it would not be reinventing a wheel from scratch.

Of course. This wasn't what I meant. I didn't mean not to write a new tool. I meant not to come up with new formats, new specifications, new behavior, new types of indexes etc. In other words, I meant to use existing infrastructure (if any) and existing standards/specifications for describing packages/dependencies, i.e. exactly what @Night-walker is asking for right now in https://github.com/daokoder/dao/issues/251#issuecomment-55416193 .

Look at the formats fpm supports, choose the smallest possible functionality these formats support and is sufficient for our modules/packages and implement it in Dao. We can mimic/get inspired by CPAN, PyPI, Cabal, npm, LuaRocks and many others. But as I said - if we make it very similar to some of them or even absolutely the same (from the API point of view), we'll have a big advantage as the existing tools won't have to be accommodated (especially their logic inside) to yet-another-lang-specific-package-repository.

dumblob commented 10 years ago

Btw imho the best packaging tool I've ever seen is GNU Guix - it's worth looking at it's core principles to get an impression how complicated packaging could get eventually and how to solve these problems in a fashionable, readable and efficient way.

daokoder commented 10 years ago

Package index. All the packages should be registered somehow, as otherwise how the tool is supposed to work with modules in different, independently updated repositories?

For package index, how about dao-category-name-version, where category could be mod for modules, tool for tools and dep or ext for external dependency libraries or tools. For example, the clinker module could be indexed by dao-mod-clinker-0.5.0, and its external library libffi by dao-ext-libffi-3.0.13.

All the packages, external libraries and tools will be archived in a central place for download. For continuously developed modules, only snapshots or certain versions will be archived there.

All the archived packages, libraries and tools will be stored as a simple customized archive format with compression. The pkgtools/archive.dao module can do archiving and extraction, zip module can do compression and decompression as shown in tools/daopkg/daopkg.dao.

Package content. What content and in what form may a package contain? Package format, possible included files, their interpretation?

A package can be a source package or binary package (mostly for Windows and Mac, I assume). A source package should contain the source files, makefiles or configuration files etc. For an external library, if it requires certain build tool to build, this tool could also become an dependency and archived in the central place. Separately or together, a package description file should also be available. This description file should specify the dependency of the package, and the steps required to build.

Installation. What steps the tool is expected to do during installation of a package? What about building, external dependencies, scripts?

The tool should be able to know what packages have been installed, what are available for installation. During installation, it should be able to check if the package to be installed is already installed, or already downloaded. And it can download the package if necessary (maybe check if a corresponding binary package is available first), and build it.

The dependency and commands for building (and other information) should be specified in the package description file which should be created for each package. @dumblob probably knows better what should be included in such as package description file.

I meant not to come up with new formats, new specifications, new behavior, new types of indexes etc.

For a moment, I thought that's what you mean, but I was not sure.

Look at the formats fpm supports, choose the smallest possible functionality these formats support and is sufficient for our modules/packages and implement it in Dao. We can mimic/get inspired by CPAN, PyPI, Cabal, npm, LuaRocks and many others.

Indeed, we can learn from their format to make a simple and adequate one if necessary.

But as I said - if we make it very similar to some of them or even absolutely the same (from the API point of view), we'll have a big advantage as the existing tools won't have to be accommodated (especially their logic inside) to yet-another-lang-specific-package-repository.

I don't know about those format, if there is one that is really simple and is well defined, we could indeed simply adopt it.

Or we can simply use Dao data structures (code) to store package information, a bit like JSON, then it only needs to be evaluated instead of parsing in order to extract the information:)

Night-walker commented 10 years ago

Taking into account the current state of development of Dao and its modules, maintaining package snapshots is tedious. API changes and bug fixes will inevitably require frequent package updates, so it seems better at the moment to link right to the active sources of packages in repositories. That means that there should be package registry (a list, to put it simply) which would specify each package's name, description, dependency list and, finally, URL of its location.

It should be the easiest way to manage packages; the package tool would just be a wrapper on top of git/fossil, and a lot of essential functionality will then be available out-of-the-box.

Or we can simply use Dao data structures (code) to store package information, a bit like JSON, then it only needs to be evaluated instead of parsing in order to extract the information:)

Executing arbitrary code is probably not a good idea. Even without security issues, it's a rather questionable way to handle package description files.

dumblob commented 10 years ago

Well, I've right now tried to quickly come up with something feasible yet KISS and I've ended up with a similar (but still a simpler) system as npm. I.e.

network of hubs (representing different package namespaces - e.g. some company will need their own hub and they'll therefore need to distinguish their packages from others)
enforced package naming (the @daokoder's scheme was OK, but missing the namespace - it should have been rather dao-namespace-category-name-version)
description file with 6 mandatory fragments:
1. name satisfying regex
2. version - with dumb lexicographical comparison run for each component separately; components divided by .
3. license - a logical expression consisting of logical operators and Short Names from https://fedoraproject.org/wiki/Licensing:Main?rd=Licensing#Software_License_List
4. sources - array of URIs (at least one element)
5. description - an arbitrary text with limited length
6. dependencies - array of URIs (might contain no elements) to other hub packages (yes, it's not a mistake - e.g. name+version is not sufficient, we need full URIs); note that we don't need any comparison operators if we use the lexicographical comparison - we can omit e.g. .3.96 from the full version 5.3.96 and immediately it'll match the "highest" version found on the hub

Nothing more for the beginning (generating templates and auto-checking etc. is only a matter of time, nothing complex). Just keep in mind that we need to support different versions of the same package/library on one system (i.e. I'll have two different versions of Dao VM installed simultaneously and also corresponding packages to each of them). Need for this is rapidly increasing and almost no package SW supports it. Also each dll or so module from the Dao repository must be versioned (disregarding if it's only internal or not).

Night-walker commented 10 years ago

A balanced design, I must admit. Just one question: how do you propose to organize versioning? It's pretty simple to identify package changeset by its hash, but I don't see a straightforward way to attribute custom version strings to it.

daokoder commented 10 years ago

Taking into account the current state of development of Dao and its modules, maintaining package snapshots is tedious. API changes and bug fixes will inevitably require frequent package updates, so it seems better at the moment to link right to the active sources of packages in repositories. That means that there should be package registry (a list, to put it simply) which would specify each package's name, description, dependency list and, finally, URL of its location.

I originally considered only releases, no snapshots. I said it since you mentioned continuously developed modules. Maybe we should not consider this, for simplicity, and supporting it is not very useful anyway.

It should be the easiest way to manage packages; the package tool would just be a wrapper on top of git/fossil, and a lot of essential functionality will then be available out-of-the-box.

I have also considered fossil (git maybe too big and has too much dependency), which has nearly has no extra dependency other than sqlite3. However, the packaging tool should handle not only Dao modules, but also the external libraries, which is hardly appropriate for fossil.

So I suggest we keep thing simple, and make the tool only to handle releases. The tool should also be able to update the package information automatically (or semi-automatically at least) for Dao modules.

Executing arbitrary code is probably not a good idea. Even without security issues, it's a rather questionable way to handle package description files.

Executing arbitrary code is bad, I was considering to evaluate such code by adding a pair of curly brackets or anything that makes loading statement invalid. Without loading, any code is harmless (infinite loops would be the worst thing, but it can be interrupt). We can also inspect the code (bytecode) for calls to ensure nothing can be called, this will avoid reading and writing files.

dumblob commented 10 years ago

how do you propose to organize versioning? It's pretty simple to identify package changeset by its hash, but I don't see a straightforward way to attribute custom version strings to it.

If I understood you correctly, a plain echo "type.$(git rev-list --count HEAD).$(git rev-parse --short HEAD)" should suffice for the lexicographical comparison. E.g. on linux, so libraries support characters [A-Za-z0-9_.] (and maybe a few others which I forgot) in the version string even if it's not the usual way to do things.

The type component should designate where does this package belongs to (always one of bleeding-edge, testing, stable - we might introduce others if needed, but for the beginning just these; btw I'm not sure about proper short keywords for these types - any ideas?). And yes, we need this information directly in the version and not only on the packaging level (otherwise conflicts arise when installing multiple same versions of different types on one system). I'm though not sure about dlls on Windows, neither on MacOSX and other systems which Dao supports (like Haiku). This needs investigation. @Night-walker, do you know what everything WinAPI allows? @daokoder, what about MacOSX and FreeBSD?

In case of bleeding-edge (usually VCS), having the second component before the second . being a monotonically increasing number should be robust enough. Each VCS should provide some monotonically increasing index of the given commit, so generating such description can be fully automated.

Either way, we really can't avoid proper versioning scheme for the non-bleeding-edge types. So we need to introduce at least major + minor whereas majors will be always compatible and minors not.

Maybe we should not consider this, for simplicity, and supporting it is not very useful anyway.

I must argue on this. There are plenty of use-cases, where one want the simplicity of packages (especially installing them with all it's advantages), but a bleeding-edge SW (i.e. directly from repository). In the end it's not so difficult to support it (see above).

Executing arbitrary code is bad, I was considering to evaluate such code by adding a pair of curly brackets or anything that makes loading statement invalid. Without loading, any code is harmless (infinite loops would be the worst thing, but it can be interrupt). We can also inspect the code (bytecode) for calls to ensure nothing can be called, this will avoid reading and writing files.

At the beginning, we would support only Dao modules with makefile.dao files. In the future, we can add something more, but it may be a good idea to stick to the design of packaging we came up with and provide the missing makefile.dao in external projects just in a second source URI. This will keep things KISS and easily maintainable => very well automatable. In other words, we shouldn't support turing-complete code in the description files (again, maybe in the future, but I doubt it's a good idea as enforced uniformity and declarativeness is always much easier way to go :)).

Night-walker commented 10 years ago

I originally considered only releases, no snapshots. I said it since you mentioned continuously developed modules. Maybe we should not consider this, for simplicity, and supporting it is not very useful anyway.

It would be much simpler and more convenient to work with sources.

First, I can't think of any releases right now. Dao is not Debian stable, fixing packages is a sure way to have a lot of problems keeping them up to date, as in the case of Dao the latter is the better. Releases only make sense only when large auditory is involved and stability is the primary concern.

Second, working directly with sources eliminates the need to constantly upload new versions of packages somewhere. After registering a package, little is needed to maintain it. Only in case meta-information has changed one would have to update the corresponding entry, and it would be fairly trivial. Package registry can itself be just a repository.

Third, it's simple to fix a package at some version -- just don't update its entry at the registry. Then, regardless of the state of its source, the package manager will make use of only the specified version. It's much simpler that way, when you just have the source rather then a bunch of differently-versioned packages for the same module or tool.

However, the packaging tool should handle not only Dao modules, but also the external libraries, which is hardly appropriate for fossil.

And what? I didn't say just use plain fossil, I mean to integrate its abilities into the package manager, together with the other stuff. It would really save a lot of time and efforts. After all, what OSS is good for if we have to develop everything from scratch, again and again?

So I suggest we keep thing simple, and make the tool only to handle releases. The tool should also be able to update the package information automatically (or semi-automatically at least) for Dao modules.

It would be around 5-10 times harder to implement, and considerably more cumbersome to maintain and use, that's my opinion.

Executing arbitrary code is bad, I was considering to evaluate such code by adding a pair of curly brackets or anything that makes loading statement invalid. Without loading, any code is harmless (infinite loops would be the worst thing, but it can be interrupt). We can also inspect the code (bytecode) for calls to ensure nothing can be called, this will avoid reading and writing files.

There are config/serialization formats for this, no need to invent anything like that. If we don't need executable code in package description, there is no reason to write it using programming language at all.

Night-walker commented 10 years ago

If I understood you correctly, a plain echo "type.$(git rev-list --count HEAD).$(git rev-parse --short HEAD)" should suffice for the lexicographical comparison. E.g. on linux, so libraries support characters [A-Za-z0-9_.](and maybe a few others which I forgot) in the version string even if it's not the usual way to do things.

Yes, but identifying package version by its SHA-1 hash is arguably not a particularly human-friendly way of distinguishing versions. However, it's possible to associate each version number in the package registry with the relevant changeset ID in the repository -- if, again, the package manager works with sources.

The type component should designate where does this package belongs to (always one of bleeding-edge, testing, stable - we might introduce others if needed, but for the beginning just these; btw I'm not sure about proper short keywords for these types - any ideas?). And yes, we need this information directly in the version and not only on the packaging level (otherwise conflicts arise when installing multiple same versions of different types on one system).

And yet again, it would be fairly trivial to maintain several "types" of the same package in a single repository using branching. The "type" can thus be easily unified with the version number, becoming some kind of generic tag pointing to a particular source changeset.

I must argue on this. There are plenty of use-cases, where one want the simplicity of packages (especially installing them with all it's advantages), but a bleeding-edge SW (i.e. directly from repository). In the end it's not so difficult to support it (see above).

Yes. It is, as I pointed out, is actually much simpler.

At the beginning, we would support only Dao modules with makefile.dao files. In the future, we can add something more, but it may be a good idea to stick to the design of packaging we came up with and provide the missing makefile.dao in external projects just in a second source URI. This will keep things KISS and easily maintainable => very well automatable. In other words, we shouldn't support turing-complete code in the description files (again, maybe in the future, but I doubt it's a good idea as enforced uniformity and declarativeness is always much easier way to go :)).

Again, package description and makefile is not the one and same thing. Package description is pure data, meta-information on the package, and makefile is some service file within the package containing instructions on how to build it. They aren't related.

dumblob commented 10 years ago

And yet again, it would be fairly trivial to maintain several "types" of the same package in a single repository using branching. The "type" can thus be easily unified with the version number, becoming some kind of generic tag pointing to a particular source changeset.

We can't do the unification (we need to maintain separated "type" from the version itself; "type" references a collection of old, current and also future releases/commits/whatever) for easy/automated maintainance of packages which depend on whatever version of a given package, but e.g. under condition, that it's a stable version.

Again, package description and makefile is not the one and same thing.

Of course, no doubts about this.

Package description is pure data, meta-information on the package, and makefile is some service file within the package containing instructions on how to build it.

Sure, this is obvious.

They aren't related.

Surprisingly, they are :) Not necessarily explicitly - we need a point in an automated generation and building of packages, where we switch from processing of the package metadata to building of the package (and/or vice versa). Basically we need two things for this - what to run to build it and which version we want to build.

Night-walker commented 10 years ago

We can't do the unification (we need to maintain separated "type" from the version itself; "type" references a collection of old, current and also future releases/commits/whatever) for easy/automated maintainance of packages which depend on whatever version of a given package, but e.g. under condition, that it's a stable version.

If package is supposed to be obtained from a repository, only "type" plus version together may identify a changeset. Alone they don't make any sense unless there is something else associated with "type".

we need a point in an automated generation and building of packages, where we switch from processing of the package metadata to building of the package (and/or vice versa). Basically we need two things for this - what to run to build it and which version we want to build.

That should be trivial -- checkout the specified revision and then run makefile.dao.

dumblob commented 10 years ago

Alone they don't make any sense unless there is something else associated with "type".

So far we have only 3 types (disjunct sets of versions). And it does make very much sense to specify only the "type". There are packages which don't care about version, they just need something from the other package for some reason (e.g. just some additional stuff for user, but not mandatory for the package itself) and if we don't want to introduce complicated dependencies, hard dependencies, build-time dependencies, soft dependencies and so on and want to stick with the simplicity of lexicographical comparison, then we have actually no other option.

That should be trivial -- checkout the specified revision and then run makefile.dao.

Yes, and we need our package-build-tool to do that (implicitly as I called it).

Btw the only thing I'm worried about are the build-time dependencies. So far we don't need them (as they're the same as the resulting package dependencies), but I'm not sure about the future. Anyway, we can add them at any time (just one more array in the description file) without much burden of package tools changes.

Night-walker commented 10 years ago

This needs investigation. @Night-walker, do you know what everything WinAPI allows?

Forgot to answer that. No one really knows what everything WinAPI allows. My, I should try myself in poetry :)

So far we have only 3 types (disjunct sets of versions). And it does make very much sense to specify only the "type". There are packages which don't care about version, they just need something from the other package for some reason (e.g. just some additional stuff for user, but not mandatory for the package itself) and if we don't want to introduce complicated dependencies, hard dependencies, build-time dependencies, soft dependencies and so on and want to stick with the simplicity of lexicographical comparison, then we have actually no other option.

I suppose anything related to the repository itself can be expressed as certain revision ID. The latter can be associated with whatever anything suitable for humans, it doesn't matter much from the technical point of view. External dependencies do require special treatment, but there is doubtfully any realistic way to handle all kinds of them in fully automated mode.

Btw the only thing I'm worried about are the build-time dependencies. So far we don't need them (as they're the same as the resulting package dependencies), but I'm not sure about the future. Anyway, we can add them at any time (just one more array in the description file) without much burden of package tools changes.

I think we shouldn't worry about too much possibilities and variety at the moment.

dumblob commented 10 years ago

Forgot to answer that. No one really knows what everything WinAPI allows. My, I should try myself in poetry :)

And what about knowing what everything WinAPI allows in the domain of DLL versioning? :)

I suppose anything related to the repository itself can be expressed as certain revision ID. The latter can be associated with whatever anything suitable for humans, it doesn't matter much from the technical point of view.

Technically we don't need any versions nor revisions, just timestamps designating a state in the World's history. Packaging is unfortunately not about finding a minimal set from the technical/physics point of view, but rather what the humans think about certain revision, build, snapshot etc. Therefore the hierarchy. From my experience the "type" is useful and solves not insignificant amount of problems on big systems/clusters/mainframes with many instances of different SW (packages and libraries) deployed.

Btw the statement "can be associated" would mean some mapping written somewhere - most probably again in the description file as some special keyword or being implicit (the worst case) somewhere else. I dislike both of these.

daokoder commented 10 years ago

I think we shouldn't worry about too much possibilities and variety at the moment.

Completely agree. I think some of our discussions have digress quite a bit from the real topic.

Let's focus on one thing first, that is how we organize the packages? How we index / reference them?

If we are to use fossil (I prefer fossil over git for its smallness and efficiency, I feel it is much faster than git), we can do something like this:

One fossil repository per package (Dao modules, external libraries etc.);
A tag is created for each installable revision of the package; The tag name is composed of type-version, where is the type can be stable ('release'), unstable (devel), alpha or beta etc., and the version number could be one to three numbers separated by dots;
A dependency file is added for each package; Each line in this composes of a package (fossil repo) name and a fossil tag name;
A building instruction file is added for each package;
The packaging tool can check the repositories for such tags, and create one package description file per each tag, automatically from the tag name, and the dependency file plus the building instruction file;
Such package description files will be archived in a central place or repository for automatic checking.

Please add anything I missed, and keep simplicity and focus in mind:)

dumblob commented 10 years ago

A dependency file is added for each package; Each line in this composes of a package (fossil repo) name and a fossil tag name

Why not to use directly full URIs? Without them, it's not a unique identifier. Anyway, I doubt this file would be useful - we should use the package description file instead.

A building instruction file is added for each package

E.g. the current makefile.dao? If not, I hope you don't want to introduce any new. Also the makefile.dao could contain targets for generating/updating the package description file if convenient.

The packaging tool can check the repositories for such tags, and create one package description file per each tag, automatically from the tag name, and the dependency file plus the building instruction file

IMHO, the packaging tool should check the repositories for such tags, and if needed (e.g. due to a newer version) update the existing package description file and upload it to the central repository. User can choose which tag type should be processed and uploaded (by selectively listing them or choosing an all option).

Otherwise I'm comfortable with the proposed solution (btw I got surprised that SQL DB is faster than git :)).

daokoder commented 10 years ago

Why not to use directly full URIs? Without them, it's not a unique identifier. Anyway, I doubt this file would be useful - we should use the package description file instead.

Package name plus tag name is a unique identifier, if we define unique package names (this is a must) and unique tag names (this is preferable). If necessary, changeset id can also be included. I believe they are equivalent to URIs.

E.g. the current makefile.dao? If not, I hope you don't want to introduce any new.

Not makefile.dao, each package should be built with its standard means of building, for Dao modules and tools, it is DaoMake with makefile.dao. But for external libraries and tools, they could be anything such as configure and cmake etc., these build tools themselves could be packaged and become allowed dependency for those external libs and tools. The building instruction would mean which build tool to use and what parameters to use. The packaging tool will invoke these building tools with suggested parameters.

Also the makefile.dao could contain targets for generating/updating the package description file if convenient.

It is better to let the packaging tool to generate or update the package description files.

IMHO, the packaging tool should check the repositories for such tags, and if needed (e.g. due to a newer version) update the existing package description file and upload it to the central repository.

That's basically what I mean or implied. If we use fossil to archive the package description files, adding to a repo will simply mean adding or updating. Of course, this can also be handled by the packaging tool.

Otherwise I'm comfortable with the proposed solution (btw I got surprised that SQL DB is faster than git :)).

Probably it is not because of SQL DB that fossil is faster than git. I think it is because fossil stores the revision histories much more efficiently, namely using much less space. I saw this difference with the Dao repos:

dao=>> ls -lthr Dao.fossil 
-rw-r--r--  1 user  staff   9.6M Sep 12 22:41 Dao.fossil
dao.git=>> du -h -d 0 .git
124M    .git

I also saw large difference in the amount of data transfering when cloning a fossil repo and a git repo.

dumblob commented 10 years ago

I believe they are equivalent to URIs.

Yes, they are valid URIs, but what I meant was that if I want to specify a dependency, I have a specific provider/namespace, name and usually also version in my mind. I'm not sure, if for official packages postponing of the decision from which provider to download the particular dependency should really be made first on the end-user side. I'd prefer to point to our Dao official repositories directly (e.g. by including a namespace or even using a full URL like http://daovm.net/hub/dao-officialdaonamespace-cat00-mod00-stable.2.0).

It is better to let the packaging tool to generate or update the package description files.

Yes. Still, there'll be a need to work with existing description files (hand-made or generated by something like fpm or make) and the packaging tools have to support it.

dumblob commented 10 years ago

Btw the size of the git repository is really big. Couldn't that be caused by the synchronization fossil->git?

Night-walker commented 10 years ago

And what about knowing what everything WinAPI allows in the domain of DLL versioning? :)

The largest unique feature of Windows DLL management is described here :)

The packaging tool can check the repositories for such tags, and create one package description file per each tag, automatically from the tag name, and the dependency file plus the building instruction file;

I think it's simpler to have single package file which contains information on all tags and versions of this package, namely because there is no benefit from fragmenting the meta-data which will likely be very simple and lightweight anyway. Basically, nothing but revision ID is minimally required to be associated with each tag.

Overall, it seems like we've more or less formed the principles behind the package management. The only issue I see is reliance on fossil, which essentially excludes any chance for someone to host a repository at GitHub. But I suppose this problem can be attended if/when the necessity arises.

dumblob commented 10 years ago

which essentially excludes any chance for someone to host a repository at GitHub.

I'm not convinced about this as long as we retain URIs with schema, i.e. with git: in front of each item in the sources array. The package tool can support any number of types of repositories - it's not difficult to implement (imagine a simple unified interface and then a class for each type of repository).

dumblob commented 10 years ago

Btw nice article about another type of hell (hey, I'm getting more and more afraid to die :D) - it seems significantly more complicated than supporting both 12 years old Linux systems along with 0.5 years old ones :)

Night-walker commented 10 years ago

I'm not convinced about this as long as we retain URIs with schema, i.e. with git: in front of each item in the sources array. The package tool can support any number of types of repositories - it's not difficult to implement (imagine a simple unified interface and then a class for each type of repository).

If it's just a matter of running the proper shell command, then yes, it's indeed simple.

dumblob commented 10 years ago

If it's just a matter of running the proper shell command, then yes, it's indeed simple.

It's imho like that or very close to it.

daokoder commented 10 years ago

Yes, they are valid URIs, but what I meant was that if I want to specify a dependency, I have a specific provider/namespace, name and usually also version in my mind. I'm not sure, if for official packages postponing of the decision from which provider to download the particular dependency should really be made first on the end-user side. I'd prefer to point to our Dao official repositories directly (e.g. by including a namespace or even using a full URL like http://daovm.net/hub/dao-officialdaonamespace-cat00-mod00-stable.2.0).

It is clearly a bad design if you put URLs in package identifiers or dependency list. The packaging tool should of course know where the official packages are archived, or the official place to archive the packages (including no official packages). The packaging tool can also be configure to access non-official archives, in such cases, it may make more sense to add URLs to the packages descriptions as additional information about where the package could be downloaded.

Btw the size of the git repository is really big. Couldn't that be caused by the synchronization fossil->git?

I don't think so. The Dao git repo has always been much bigger than Dao fossil repo, and the git repo was produced from the previous hg repo.

I think it's simpler to have single package file which contains information on all tags and versions of this package, namely because there is no benefit from fragmenting the meta-data which will likely be very simple and lightweight anyway. Basically, nothing but revision ID is minimally required to be associated with each tag.

Sure, it is also quite natural to use just a single file.

The only issue I see is reliance on fossil, which essentially excludes any chance for someone to host a repository at GitHub. But I suppose this problem can be attended if/when the necessity arises.

It should be quite easy to set up fossil repos for those repos, and update them automatically. So no need to concern about this.

dumblob commented 10 years ago

The packaging tool can also be configure to access non-official archives, in such cases, it may make more sense to add URLs to the packages descriptions as additional information about where the package could be downloaded.

Exactly - and the idea is to put the two pieces of information together into one default URI :) The packaging tool can do whatever needed with the representation - e.g. use only the last part (dao-officialdaonamespace-cat00-mod00-stable.2.0). This seems to me more KISS than maintaining an array of two-item tuples.

daokoder commented 10 years ago

Exactly - and the idea is to put the two pieces of information together into one default URI :) The packaging tool can do whatever needed with the representation - e.g. use only the last part (dao-officialdaonamespace-cat00-mod00-stable.2.0). This seems to me more KISS than maintaining an array of two-item tuples.

You probably misunderstood what I wrote, I actually meant the opposite:)

Night-walker commented 10 years ago

Here is the package description format I currently stopped on.

First, each different tag/version of a module should be linked with a separate package description file. It appeared to be simpler in the end; besides, package description, its authors, etc. may differ from version to version.

The format itself is trivial: each file consists of key-value pairs in the form property: value, where property can be:

author
description
license
repository
revision
dependencies
???

For example

author: Someone
description: some package
license: license list
repository: url
revision: 1234
dependencies: pkg1, pkg2, pkg3

All property values are plain strings, excluding dependencies which is a comma-separated list. The reason I opted to include dependencies right in the package description is because I'd like to know in advance what stuff will be installed if I choose this or that package. This can also prevent some errors related to package description early on.

What I haven't yet decided is how to organize package building.

dumblob commented 10 years ago

What I haven't yet decided is how to organize package building.

You mean something like this?

$ daopkg build --target arch_linux-x86_64 dao-company00-devel-module00-1234
downloading description... 100%
checking for build-time dependencies... 2 found
downloading build-time dependencies... 100%
preparing build environment... 100%
installing into the build environment:
  1. dao-company00-devel-module22-5678
  checking for binaries available... 0 found
  downloading description... 100%
  checking for build-time dependencies... all satisfied
  building... 100%
  installing temporarily... 100%
  2. dao-company00-devel-module33-346
  checking for binaries available... 0 found
  downloading binaries... 100%
  installing temporarily... 100%
building... 100%
uninstalling:
  2. dao-company00-devel-module33-346
  1. dao-company00-devel-module22-5678
resulting package: dao-company00-devel-module00-1234.extension_of_the_compression_format
$

Build-time dependencies will be resolved by the build-system of that particular package. Then daopkg can install these build-time dependencies into the prepared build environment. Finally it'll run the building of the desired package. It's just plain recursion :)

dumblob commented 10 years ago

Of course the main optimization for this process is to use the local system as --target and having the build-time dependencies installed already in the system so that they don't have to be installed and uninstalled each time you're building the desired package.

Night-walker commented 10 years ago

Build-time dependencies will be resolved by the build-system of that particular package.

And what a build-system is? daomake or usual make/cmake will not be able to carry that out. Only the package manager is able to fetch some dependency stuff, so it basically boils down to dependencies property in the package file.

I think there may generally be three types of packages:

plain Dao scripts (no building)
Dao-dependent modules (daomake)
external stuff (?)

The latter should also have some kind of build scripts, I guess. daomake may actually be suitable for that too since it can run shell commands, determine OS, etc. Then it means just running daomake on makefile.dao if the latter exists, and then... Then make && make install? If daomake can properly handle it all, it should be that simple.

dumblob commented 10 years ago

And what a build-system is?

It doesn't matter, it just prints out the list of build-time dependencies so daopkg can do the rest as I outlined.

The rest you've written seems needlessly complicated. For the beginning assume repositories only with makefile.dao and extended functionality of daomake to print the list of build-time direct dependencies. No types of packages, nothing like that is needed.

We might though consider moving the build-time dependencies to the description file (which's a well-proven way how most packaging software treat this issue) to avoid extending daomake and avoid consequent changes in each makefile.dao and also avoid any changes in other build systems (possibly supported in the future). Btw I'd prefer this solution.

daokoder commented 10 years ago

The format itself is trivial: each file consists of key-value pairs in the form property: value, where property can be:

This should be the simplest way. In the description format, I think we should also include a human readable identifier for the revision, a repo tag for instance.

daomake may actually be suitable for that too since it can run shell commands, determine OS, etc. Then it means just running daomake on makefile.dao if the latter exists, and then... Then make && make install? If daomake can properly handle it all, it should be that simple.

I also think it should be easy to do it in daomake.dao. We can use it to guess the type and parameters of the building tool supported by external library, check the availability of such tools, and run them accordingly.

But as I mentioned before, we should package some standard build tools (cmake in particular), and add it as dependency to those libraries that can be compiled with cmake. So that, when such tools are not provided by the system, the packaging tool can install them.

extended functionality of daomake to print the list of build-time direct dependencies.

I think @Night-walker did not imply extending DaoMake, instead, he meant preparing makefile.dao that can be executed by DaoMake. There isn't much to extend on DaoMake, it is already quite powerful, I believe :)

Night-walker commented 10 years ago

extended functionality of daomake to print the list of build-time direct dependencies

It's much simpler to just list the built-time dependencies in the package description itself, at least as far as building from sources is the main way of getting Dao and related stuff.

No types of packages, nothing like that is needed.

I didn't mean some formal types, I just outlined what the scenarios the package manager should be able to deal with.

Night-walker commented 10 years ago

This should be the simplest way. In the description format, I think we should also include a human readable identifier for the revision, a repo tag for instance.

Package name is supposed to include tag/version information anyway.

I think @Night-walker did not imply extending DaoMake, instead, he meant preparing makefile.dao that can be executed by DaoMake. There isn't much to extend on DaoMake, it is already quite powerful, I believe :)

Yes, since daomake handles dao files which can actually contain any Dao code, it is mostly up to that code to implement ad-hoc build/install scenarios.

daokoder commented 10 years ago

Package name is supposed to include tag/version information anyway.

I see, that should be fine.

dumblob commented 10 years ago

It's much simpler to just list the built-time dependencies in the package description itself, at least as far as building from sources is the main way of getting Dao and related stuff.

Let's go for it then :)

Night-walker commented 10 years ago

Now, should the package manager support what @dumblob referred to as hubs? Naturally, they can be directories in the package registry, in which each package may only depend on other packages in the same or inner (sub) directory.

Another question is about platform-dependent packages and dependencies: should they be supported?

And also let's clarify few minor moments:

final name for the package manager (the shorter, the better)
extension for package description files (anything, but preferably unambiguous)
place to install (copy) packages without makefile.dao (how to determine where Dao modules reside? or maybe makefile should be mandatory?)

dumblob commented 10 years ago

Another question is about platform-dependent packages and dependencies: should they be supported?

Tough question. Maybe this question could be generalized as "Do we want to allow any differences in between different platforms in the future?". I tend to say no, but I'm not sure. Anyway, this'd lead to an addition of another property - platform or architecture.

final name for the package manager (the shorter, the better)

The longest is daopkg. I don't have enough imagination to come up with new names, but I can surely say, that we can't use e.g. dpkg :)

extension for package description files (anything, but preferably unambiguous)

Just the extension used for that type of structured files. .yaml if it's yaml, .json if it's json, .ini if it's ini etc. We don't have to worry that it'll be confused with something else as it'll have the nice dao-namespace-category-name-version name :)

how to determine where Dao modules reside? or maybe makefile should be mandatory?

Mandatory. At least for the beginning.

daokoder commented 10 years ago

Now, should the package manager support what @dumblob referred to as hubs?

I think we can skip it now, it does not seem to be cumbersome to add later on.

Another question is about platform-dependent packages and dependencies: should they be supported?

Preferably not. Such packages should not be encouraged.

final name for the package manager (the shorter, the better)

daopkg seems a reasonable one. But if you want a cool name, we can try to think about something else:)

extension for package description files (anything, but preferably unambiguous)

This may be related to the tool name. For daokpg, how about dkg?

place to install (copy) packages without makefile.dao (how to determine where Dao modules reside? or maybe makefile should be mandatory?)

For Dao modules, they would be built with DaoMake, which will know where Dao modules are installed, and a make install should correctly install them.

For external libraries, it may rely on DaoMake to detect the installation of Dao. Preferably, such libraries should be installed in locations close to Dao modules. Since Dao modules are normally install at $SomeRoot/lib/dao/modules/, a reasonable location would be $SomeRoot/lib/dao/dependencies/ or simply $SomeRoot/lib/dao/deps/.

Also, since we expect the packaging tool to be installed along with the core Dao and standard modules. It should be simple to provide the packaging tool with the relevant information when it is installed.

Night-walker commented 10 years ago

Another question is about platform-dependent packages and dependencies: should they be supported? Preferably not. Such packages should not be encouraged.

There are things which exists only on a certain platform, and are too specific to that platform to be somehow mimicked on others. On Windows it is the registry and COM/OLE, to name a few. Unix systems also have various specific concepts and services. Sometimes you have to work with them, so the relevant modules may (even should) eventually appear. So perhaps it's better to reserve such possibility right from the start.

extension for package description files (anything, but preferably unambiguous) This may be related to the tool name. For daokpg, how about dkg?

Maybe dpk, i.e. the mnemonics is "dao package"?

daokoder commented 10 years ago

There are things which exists only on a certain platform, and are too specific to that platform to be somehow mimicked on others. On Windows it is the registry and COM/OLE, to name a few. Unix systems also have various specific concepts and services. Sometimes you have to work with them, so the relevant modules may (even should) eventually appear. So perhaps it's better to reserve such possibility right from the start.

Right.

Maybe dpk, i.e. the mnemonics is "dao package"?

It's ok too. BTW, dkg is also kind of mnemonics for "dao package", where the d could represent both dao and the rotation of p:).

dumblob commented 10 years ago

where the d could represent both dao and the rotation of p:).

Good point :)

Night-walker commented 10 years ago

Now that I added sys.uname(), it's possible to support platform-specific packages and dependencies.

First, a package may specify compatibility property, a comma separated list of supported systems (case-insensitive):

windows
unix
concrete name to be matched against system (on Unix) or system + ' ' + version (on Windows) returned by sys.uname().

Next, package names in package dependencies list may be restricted to the particular system by using pkgname (system1, system2, ...) syntax, where system names are interpreted identically to the above description.

daokoder / dao

Reorganizing Dao modules! #251