Open kloczek opened 3 years ago
Thanks for you thoughts. Please note that that main discussion point is at selinux@vger.kernel.org mailing list.
Current state causes many difficulties on maintaining SELinux packages tooling.
One repo to rule them all
Theoretically all SELinux tools are in separated source subtrees however and all dist tar balls are released as set of many tar balls however:
* many times single git commit is done across many subdirectories which makes very difficult extracting patches from git after latest release
I use git format-patch... -- <directory>
:
$ git show 4a142ac46a11 -- libsepol | grep -E 'diff.*(libselinux|libsepol)'
diff --git a/libsepol/src/Makefile b/libsepol/src/Makefile
$ git show 4a142ac46a11 -- libselinux | grep -E 'diff.*(libselinux|libsepol)'
diff --git a/libselinux/src/load_policy.c b/libselinux/src/load_policy.c
$ git show 4a142ac46a11 | grep -E 'diff.*(libselinux|libsepol)'
diff --git a/libselinux/src/load_policy.c b/libselinux/src/load_policy.c
diff --git a/libsepol/src/Makefile b/libsepol/src/Makefile
* because everything is in one repo changes in each subdirectories must wait on that one (master) release and _sometimes that release even does not introduce any changes in some directors_
Could you please share the concrete change which was not propagated to new release?
* despite semi modularisation test suite seems is maintained in assumption that it will be executed in main selinux directory. Example is libsepol test suite:
+ /usr/bin/make -O -j1 V=1 VERBOSE=1 test /usr/bin/make -C tests test make[1]: Entering directory '/home/tkloczko/rpmbuild/BUILD/libsepol-3.1/tests' gcc -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fdata-sections -ffunction-sections -Os -I../include/ -I../../checkpolicy/ -c -o debug.o debug.c gcc -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fdata-sections -ffunction-sections -Os -I../include/ -I../../checkpolicy/ -c -o helpers.o helpers.c helpers.c:25:10: fatal error: parse_util.h: No such file or directory 25 | #include "parse_util.h" | ^~~~~~~~~~~~~~ compilation terminated. make[1]: *** [<builtin>: helpers.o] Error 1
That parse_util.h is not part of public checkpolicy API and for example in Fedora there is no checkpolicy-devel package
* there are circular dependencies like checkpolicy during build requires libsepol and libsepol requires header files provide by checkpolicy (like in libsepol test suite uses checkpolicy headers)
In fact libsepol doesn't requires checkpolicy to build, the test suite is not part of the build process
all:
$(MAKE) -C src
$(MAKE) -C utils
* there is no proper separation between public and private API. This why currently is not possible to use only shared libraries
I see this as the public API is declared in libsepol/include
, libselinux/include
and libsemanage/include
, everythin else is internal API.
How to solve above issues all together?
* start maintaining SELinux tooling in separated git modules. Proper separation can be done continuously with extracting one by one each necessary part and remove it from SELinuxProject/selinux * introduce proper build framework using for example meson with checking build dependencies using * define properly libraries public interface and use only that API in other modules * start release each module (currently directories) with own versioning. This would allow smooth fixing critical issues * divide current test suite and spread across each component with using only other components public API/ABI and test internal API only inside each component
If you are not familiar with meson I can offer to help on that part. Using meson would allow solve many other issues like transparent LTO or PGO optimisation, easy integration of test units, coverage, generate dist tar balls and many other.
You propose quite a big change which needs to be discussed at selinux@vger.kernel.org mailing list. Also feel free to send patches there, see https://github.com/SELinuxProject/selinux/blob/master/CONTRIBUTING.md for instructions.
In fact libsepol doesn't requires checkpolicy to build, the test suite is not part of the build process
And that is the problem. Whole SELinux is in kind of internal self orthogonal state. On one angle everything is in separated directories but everything is kept in single git repo. Tracing changes against single directory/module is not easies as it could be, On the another angle (of test units) everything in single tree.
Again: if it is not possible to use test units per directory unpackaged from dist tar balls those units cannot be used to automatically confirm in automated build process that some basic functionalities are OK. That step of building, preparing and testing binary packaged resources is really important.
You propose quite a big change which needs to be discussed at selinux@vger.kernel.org mailing list. Also feel free to send patches there, see https://github.com/SELinuxProject/selinux/blob/master/CONTRIBUTING.md for instructions.
This is not one big change. All what needs to be done is for example:
In above scenario each cloned repo will keep full history of the past changes. At the end SELinuxProject/selinux could be deleted or archived (left in RO state just FTR). In other words all that kind of changes could be done sets of smaller batches of changes. Issue is that because it is about cloning/forking repos steps 1-3 cannot be done be someone like me. I can only submit PRs (and I'm not asking to give me permission to that kind of operations).
Again, this discussion should happen at selinux@vger.kernel.org - you don't have to be subscribed to send an email there, it's common to CC people from outside. This place is not visible for everybody, not properly archived and so on.
What you propose brings a lot of work and testing which has to be done by someone, e.g. 5 steps you mentioned above would break make test
in libsepol
standalone repo in the way you describe it's broken when you you only libsepol-X.Y.tar.gz.
From my POV the solution for your problem is not to split sources but to merge tarballs together. In fact such tarball is available at https://github.com/SELinuxProject/selinux/releases/tag/20200710 - https://github.com/SELinuxProject/selinux/archive/20200710.tar.gz In the next release this file would be called 3.2.tar.gz
but I guess we can generate it as selinux-3.2.tar.gz
as part of release.
I can only submit PRs
Thanks. Please follow https://github.com/SELinuxProject/selinux/blob/master/CONTRIBUTING.md If you can't send a patch to mailing list for some reason, I can do it for you.
Really I don't see any sense to discuss this with anyone else than maintainer. I'm not interested to waste time on talks with anyone else.
If you are not convinced by what I;ve wrote just cloce the ticket.
Really I don't see any sense to discuss this with anyone else than maintainer. I'm not interested to waste time on talks with anyone else.
If you are not convinced by what I;ve wrote just cloce the ticket.
That mailing list is how you reach maintainers. Not all of the maintainers are reachable via Github. This is something that I think warrants discussion, but you need to be willing to bring it to the right venue: the mailing list.
So please just discuss that between maintainers. With full respect no one else needs to be involved in such discussion (even me).
Yet another small pebble to whole pile of problems:
I use git format-patch... --
:
It doesn't work or it works only when git command is used to extract patches.
Issue is that It generates patch with exactly the same hash as original commit but with extracted changes only in exact directory.
Such patch cannot be verified against original git repo (because it has different content) or cannot be downloaded over gitlab/github/cgit HTTP/HTTPS rest interface by for example `wget https://github.com/SELinuxProject/selinux/commit/
I'm using in my automated rpm build processes extraction of the cgit/gitlab/githiub patches over rest interface on really massive scale because I want to retest all new changes against whole set of packages by do scratch build after each passible commit and test such scratch package does it interacts still correctly with all other packages.
Using HTTP/HTTPS rest interface is very useful because it allows download single commit in sometimes fraction of second without cloning git repo and extracting exact commit.
Only packages on scale of thousands of other packages (and and few 100k patches tested that way) are SELinux packages because only here some parts which are logically separated entities are maintained in single git repo.
In other words holding all SELinix tooling code in single git repo adds additional overhead on packaging layer because all SELinux packages maintenance needs to be handled in special way. Using single git repo creates obstacle in continuous testing all possible upcoming changes. As you can guess to confirm that everything is OK such continues testing very extensively relies on test suites integrated into each possible package source trees. However in case of SELinux it cannot be used because bad separation of of test inside each SELinux logical bits.
Of course it is yet another possibility to solve all described issues by just stop distributing SELinux source code tooling as separated dist tar balls (which I don't think that would be healthy). That is probably my last argument pointing on proper separation of SELinuxProject/selinux/* directories repo into separated git repos.
Is it any chance to modularize whole SELinix source code a bit?
IMO maintaining all SELinux stuff as packages is more and more difficult. All because everything is in single SCM tree. Looking only on Fedora spec files is possible to see how complicated whole packaging process must be now and that complexity only grows. As each part of the SELinux tooling is entangled with main tree which releases are not so often more and more patches from git needs to be added. Still some components are using internal API of other components which forces to use static libraries. IMO everything instead improving looks more and more complicated without clear path where everything is going.
If may I suggest IMO it would start from separating libsepol, libsemanage and libselinux. Than when those three components will be isolated other parts could start using exact version of each of those libraries as minimum version.
BTW discussion anything. IMO it would be better to unlock discussion in git repo and move away from mailing list.
Looking only on Fedora spec files is possible to see how complicated whole packaging process must be now and that complexity only grows.
For information, using such phrasing ("is possible to see how complicated") without giving proper pointers/URL is unrespectful: if I want to take a look at "Fedora spec files", I can find https://src.fedoraproject.org/rpms/libsepol but I am not sure whether you meant libsepol, libselinux or another package. Then I need to choose a branch and try guessing what you mean by "see how complicated whole packaging process must be now" and fail to see the complexity. In the end, I feel like I have wasted my time trying to understand this sentence.
Could you please avoid using such generic phrasing and be more respectful and precise in your messages?
I suggest IMO it would start from separating libsepol, libsemanage and libselinux.
If I may provide some feedback from some experience from another project I participated recently: the TPM 2.0 software stack uses separate repositories on https://github.com/tpm2-software/. It seems nice and clean, but when you start depending on development versions of other dependencies, it becomes quite hard to install properly. And sometimes changing something in a project breaks another one and this breaks the Continuous Integration for some seeks (for example I experienced this in https://github.com/tpm2-software/tpm2-pkcs11/pull/702 ). So IMHO, right now I feel like separating the SELinux projects in several repositories will require much more maintenance effort than currently and I think there are not enough active maintainers to bear such an effort.
If I may provide some feedback from some experience from another project I participated recently: the TPM 2.0 software stack uses separate repositories on https://github.com/tpm2-software/. It seems nice and clean, but when you start depending on development versions of other dependencies, it becomes quite hard to install properly. And sometimes changing something in a project breaks another one and this breaks the Continuous Integration for some seeks (for example I experienced this in tpm2-software/tpm2-pkcs11#702 ). So IMHO, right now I feel like separating the SELinux projects in several repositories will require much more maintenance effort than currently and I think there are not enough active maintainers to bear such an effort.
Sorry but I don't see that complexity. We are talking about exactly the same commit but released as version in which changes in API/ABI in nore component should trigger series of releases of another packages. Issue is that as long as there is no in SELinux clear separation between components everything is blurred and there is no boundaries. Result is that currently as you can see for example in Fedora it is necessary to apply several patches taken from git. Example:
[tkloczko@barrel SPECS.fedora]$ for i in $(grep https://github.com/SELinuxProject * -l); do echo -n "$i "; grep -c ^Patch $i; done
checkpolicy.spec 16
libselinux.spec 37
libsemanage.spec 4
libsepol.spec 101
mcstrans.spec 4
policycoreutils.spec 25
secilc.spec 13
setools.spec 2
Those numbers shows number of patches per each SELinux component.
As you see ALL those patches are from git. All those patches are taken from single repo. If you are thinking about that by keeping everything in one repo makes all that simpler that may be even true but at the end of the day it delegates more work to packaging layer. Example SuSE on top of latest libsepol 3.2 adds 3 patches (all of them are CVEs!!) and you can find more of that kind of patches in other SELinux components. Because libsepol is used as static library by other SELinux components it is necessary to check impact of such CVEs to consider recompile other components which are using libsepol.a.
As I wrote I can offer some help on such separation. Separation would allow release only one component after critical issue in some place without scratching head what to do with other components.
As I wrote I can offer some help on such separation.
If SELinux maintainers want my help all what I would ask would be just try to fork SELinuxProject/selinux
repo to SELinuxProject/libsepol
and I'll try to submit PR with series of commits to reshape that first part.
After that it will be still necessary to tweak a bit SELinuxProject/selinux
however that part can be done with really small overhead.
At any time it would be possible to delete that work without disturbing what is in SELinuxProject/selinux
.
Why cloning?
IMO it would be good to keep all past changes in each SELinuxProject/<component>
but that is not necessary.
If SELinux maintainers want to do that without keeping record of past changes I would just ask to copy SELinuxProject/selinux/libsepol
to SELinuxProject/libsepol
.
With separated properly libsepol it will be IMO clear to see where all that may be heading (I can understand that it is still may not be clear).
Those numbers shows number of patches per each SELinux component.
These numbers shows only patches which were backported from upstream master
to Fedora. It's not related to the fact that everything is one repo, it's related to the fact that I wanted to have latest unreleased bits in Rawhide so that it's tested and prepared for 3.3 release.
As you see ALL those patches are from git. All those patches are taken from single repo.
Right, but I don't see this as an issue So
git format-patch -N 3.2 -- libsepol
git format-patch -N 3.2 -- libselinux
...
OTOH with the split of the repo, Fedora would have a problem with policycoreutils package as it's originally based on one directory policycoreutils
which was split to several new directories, so instead of one command
git format-patch -N 3.2 -- policycoreutils python gui sandbox dbus semodule-utils restorecond
it would have to clone 7 different repositories.
If you are thinking about that by keeping everything in one repo makes all that simpler that may be even true but at the end of the day it delegates more work to packaging layer. Example SuSE on top of latest libsepol 3.2 adds 3 patches (all of them are CVEs!!) and you can find more of that kind of patches in other SELinux components.
Number of patches used in package is not related to the format of this repo. Fedora libsepol package has complete upstream master backported, while SuSE patched only CVEs. If libsepol was split, Fedora would have same number of patches in the package as it's now.
Because libsepol is used as static library by other SELinux components it is necessary to check impact of such CVEs to consider recompile other components which are using libsepol.a.
You would still need to rebuild everything statically linked.
As I wrote I can offer some help on such separation. If SELinux maintainers want my help all what I would ask would be just try to fork
SELinuxProject/selinux
repo toSELinuxProject/libsepol
and I'll try to submit PR with series of commits to reshape that first part. After that it will be still necessary to tweak a bitSELinuxProject/selinux
however that part can be done with really small overhead. At any time it would be possible to delete that work without disturbing what is inSELinuxProject/selinux
.Why cloning? IMO it would be good to keep all past changes in each
SELinuxProject/<component>
but that is not necessary. If SELinux maintainers want to do that without keeping record of past changes I would just ask to copySELinuxProject/selinux/libsepol
toSELinuxProject/libsepol
. With separated properly libsepol it will be IMO clear to see where all that may be heading (I can understand that it is still may not be clear).
I don't think you need to use SELinuxProject for this. Feel free to start in your own namespace, and when you think it's ready, send patches for review to selinux@vger.kernel.org. Also I'd like to ask again to move this discussion to the selinux@vger.kernel.org list as well.
Almost two years later and looks like none of the progress has been made on better isolation/separation of rhe SELinux exacr components. Currently:
Curently looks like usimg SELinus tooling outside monorepo soes not make to much sense. Other problems are related to build fra,ework consisting from set of custom/mand made Makefile files.
Is it any long term to reshape/reorganize that? 🤔
Current state causes many difficulties on maintaining SELinux packages tooling.
One repo to rule them all
Theoretically all SELinux tools are in separated source subtrees however and all dist tar balls are released as set of many tar balls however:
~~~~~ compilation terminated. make[1]: *** [How to solve above issues all together?
If you are not familiar with meson I can offer to help on that part. Using meson would allow solve many other issues like transparent LTO or PGO optimisation, easy integration of test units, coverage, generate dist tar balls and many other.