Stichting-MINIX-Research-Foundation / minix

Official MINIX sources - Automatically replicated from gerrit.minix3.org
Other
3.02k stars 971 forks source link

All changes to NetBSD code should be stored as patches somewhere #81

Open Tookmund opened 9 years ago

Tookmund commented 9 years ago

The way we import NetBSD userland now makes it very difficult to update as all minix related changes are not stored in patch files anywhere.

My initial idea is that we should have a minix/patches directory where a subdirectory of the entire tree is stored and a patch file is created for each program. So for example:

minix/patches
    bin
        dd.patch (containing all changes made to dd)
        cp.patch (containing all changes made to cp)
        (and so on)
    etc
    lib
    (and so on)

This would be a huge amount of effort, so I would like to get some feedback on this design before it is implemented. Is this a good idea? Is there a better way? Any ideas on how these patches could be kept in sync and applied?

boricj commented 9 years ago

Looks like a good idea to me.

As for a better way, I'd keep a patch file per directory (limited to the files in that directory), mirroring the complete (or the subset requiring patching) NetBSD source tree under releasetools/patches/. This directory shouldn't be stored inside the Git repository since it can be regenerated as needed with the patches_* tools.

To handle this, I'd make a couple of shell scripts to :

  1. Build releasetools/patches/ from the source tree against a NetBSD source tree reference, blacklisting as needed,
  2. Apply all patches onto a NetBSD source tree (log all the conflicts), copy over minix/, releasetools/ (minus the patches) and .git/. Keep Git in the loop to be able to admire the resulting git status.

Or the way around. Not sure what would be the best.

To update Minix 3's tree, the procedure would go like this :

  1. Run first shell script against the current NetBSD source tree reference,
  2. Run second shell script onto the future-current NetBSD source tree,
  3. Overdose on your favorite poison (coffee, beer...) while fixing all the conflicts,
  4. Test the heck out of it.
  5. Profit.

Making good use of the copied .git/ directory is recommended to keep sanity during step 3.

Tookmund commented 9 years ago

That's a much better idea! I'll begin working thorugh the patching right now. Based on my earlier work in #79 I'm going to use a whitelist instead of blacklist because there are less special cases that way and I've already built the whitelists.

Tookmund commented 9 years ago

I'm working on this in https://github.com/Tookmund/minix/tree/updateminix

Tookmund commented 9 years ago

Patches generated with releasetools/genpatches.sh Now to figure out which are necessary and which aren't

sambuc commented 9 years ago

Hi, before you spent too much time on this, a couple of questions:

### Setup
git clone git://git.minix3.org/minix minix
cd minix
git remote add netbsd git://git.minix3.org/netbsd
git fetch netbsd

### Compare two directories
git diff netbsd/master -- <some_path>

### Checkout a not yet imported directory
git checkout netbsd/master -- <some_dir_or_file>

I am asking, because we used to have a tool which based on a file would checkout the netbsd CVS sources, generate a patch for registered files & directories.

We moved out of this, as the list was poorly maintained, and the end result was the usefulness of the tool drastically dropped, while still generating work.

Now, this is me doing it, and I arguably know very well the whole tree, it's quirks and status.

I am saying this, as the current process allows for the whole tree to be resynchronized in a matter of days now, while not generating any overhead for the projects when we need to import or patch specific parts of the NetBSD sources. This is extremely important, as we do not want to slow down the day to day work, while keeping the method working.

From past experience, any kind of lists requires work to keep them up-to-date, and if something is not used as part of the day-to-day workflow, it usually breaks down simply because we introduce a change and forget to do the required updates as well.

While I would welcome a way to further improve the current situation, I am skeptical that the list-based approach will succeed as we already tried it. If you want to get a closer look at it, you should checkout the following branch: https://github.com/Stichting-MINIX-Research-Foundation/minix/tree/R3.2.0 and take a look at the following files:

What I think would be an improvement over the way I currently do it is to do something along the lines (commands from the top of my head, needs to be checked):

  1. checkout the minix sources
  2. git grep -ni minix | cut -d: -f1 | sort -u >modified_files
  3. series of git checkout from netbsd / overwrites from the new netbsd sources for the relevant directories
  4. for f in $(cat modified_files); do git checkout $f; done
  5. compare the tree with the new netbsd tree using meld, and resolve the conflicts as required.
  6. check the results works for all configuration.

This brings down the step 5. to removing files which are no longer required (because they were removed/ renamed in NetBSD) and actually taking a look at files which we have patched.

Regards,

Lionel

Tookmund commented 9 years ago

I can see how that would be a lot of work to maintain some of this. What I was thinking of was more along the lines of a set of patches and a set of lists of directories known to work (a whitelist).

The initial set of patches will take a while to set up but should only need to be changed if a program changes significantly or a new program is added. @boricj suggested it not be stored in git, but we probably should because I cannot regenerate patches only containing minix-specific changes reliably. This would also reduce maintenance cost.

I have already generated the white lists of all programs that work on minix and those should only need to be updated if a new program is imported.

Since he will be the one maintaining it in the long-term and there is a lot work to be done upfront I will await @sambuc 's approval before continuing to work on this.

Jacob

On Jun 18, 2015, at 2:45 AM, Lionel Sambuc notifications@github.com wrote:

Hi, before you spent too much time on this, a couple of questions:

What this do which is not already available by doing the following:

Setup

git clone git://git.minix3.org/minix minix cd minix git remote add netbsd git://git.minix3.org/netbsd git fetch netbsd

Compare two directories

git diff netbsd/master --

Checkout a not yet imported directory

git checkout netbsd/master -- What kind of manual labor does your new method generates (in terms of maintaining the lists, etc) ? I am asking, because we used to have a tool which based on a file would checkout the netbsd CVS sources, generate a patch for registered files & directories.

We moved out of this, as the list was poorly maintained, and the end result was the usefulness of the tool drastically dropped, while still generating work.

The first time I resynchronized with NetBSD it took me literally months (full time) to do it, the second time (84d9c62), it took me a couple of weeks, and I have done 90% of the job on the last easter extended weekend, so a couple of days. I think I need about one more week full time to finish that work. Now, this is me doing it, and I arguably know very well the whole tree, it's quirks and status.

I am saying this, as the current process allows for the whole tree to be resynchronized in a matter of days now, while not generating any overhead for the projects when we need to import or patch specific parts of the NetBSD sources. This is extremely important, as we do not want to slow down the day to day work, while keeping the method working.

From past experience, any kind of lists requires work to keep them up-to-date, and if something is not used as part of the day-to-day workflow, it usually breaks down simply because we introduce a change and forget to do the required updates as well.

While I would welcome a way to further improve the current situation, I am skeptical that the list-based approach will succeed as we already tried it. If you want to get a closer look at it, you should checkout the following branch: https://github.com/Stichting-MINIX-Research-Foundation/minix/tree/R3.2.0 and take a look at the following files:

tools/nbsd_diff.sh tools/nbsd_ports What I think would be an improvement over the way I currently do it is to do something along the lines (commands from the top of my head, needs to be checked):

checkout the minix sources git grep -ni minix | cut -d: -f1 | sort -u >modified_files series of git checkout from netbsd / overwrites from the new netbsd sources for the relevant directories for f in $(cat modified_files); do git checkout $f; done compare the tree with the new netbsd tree using meld, and resolve the conflicts as required. check the results works for all configuration. This brings down the step 5. to removing files which are no longer required (because they were removed/ renamed in NetBSD) and actually taking a look at files which we have patched.

Regards,

Lionel

— Reply to this email directly or view it on GitHub.

sambuc commented 8 years ago

Hi, @Tookmund,

During the last rsync, I implemented some of the steps we spoke about as a small script, which dramatically lowered the overhead for files which are unpatched. In the long run it will be the vast majority, so this is a nice gain, as it allows me to keep my efforts for the ones which need it.

It is in the source tree as releasetools/netbsd-resync.sh:

#!/bin/sh
: ${BUILDSH=build.sh}

if [ ! -f ${BUILDSH} ]
then
        echo "Please invoke me from the root source dir, where ${BUILDSH} is."
        exit 1
fi

if [ -z "${NETBSD_BRANCH}" ]
then
        echo "NETBSD_BRANCH is undefined."
        exit 1
fi

find . -type f    | cut -c 3-   | grep -v '\.git' | grep -v '\./minix' | sort -u > files.all
git grep -i minix | cut -d: -f1 | grep -v '\.git' | grep -v '\./minix' | sort -u > files.minix
diff files.all files.minix |grep '^<'| cut -c 3- > files.netbsd

while read file
do
    git checkout ${NETBSD_BRANCH} ${file}
done < files.netbsd

This does not yet manage files from the NetBSD tree which were moved, removed or added. This is a use-case which is not so common, so I don't see any problem to review those as part of the patched (by us) files. The actions to take are also rather simple, so it doesn't take too much energy.

That said, if you can come up with a way of finding files which were moved, this would help, although I have no idea on how to do this. Keep in mind that even if moved, a file with patches from our side should not be replaced by the NetBSD one, as manual review is required in that case.

Regards,

Lionel

Tookmund commented 8 years ago

Sorry it's been so long; school and other stuff got in the way.

This script is awesome! I tested it out with an automated netbsd git repo I found ( https://github.com/jsonn/src ) and it seems work great! (I really didn't want to take the time to setup the cvs stuff)

That should really reduce the manpower required to resync the two and so maybe that could be done before 3.4.0? Looks like it was last synced in October 2015 which is kind of a long time if we want to follow the main branch of netbsd, which it looks like we are currently doing.

We could also sync to the stable or security branches and just patch stuff if there is a vulnerability. There hasn't been one since we last synced but there's certainly been a lot of work done in netbsd since then. Following release branches instead would require much less resyncing than if we wanted to constantly follow main.

I'm just concerned that we use basically all netbsd for userland but don't seem to sync up often. This could lead to vulnerabilities, bugs, and other general badness as time wears on and the projects get out of sync. I realize this is mostly a time thing, because the project doesn't have many people, but that's why I opened this in the first place.

There still certainly is a non-negligible amount of work involved in resyncing and I don't know enough yet to try the whole process myself. Is there any other major bottleneck to resyncing regularly or with stable branches?

I will try to look into finding moved files but I don't have a lot of time recently as can be seen from how long it took me to get back to this.

antoineL commented 8 years ago

Let me just add experience about the NetBSD issue. tl;dr: use Fossil. Long story: I intended for a while to follow NetBSD source tree; CVS was not an option because I wanted to have (some) access to history; also I was not disponible enough, so cannot afford to synchronize it every few hours, as it is supposed to be used. So I found out Joerg's work about the git repository linked above, and I started to use it; it was great for a while, but after a couple of months of irregular activity, fetching did not work; the underlying reason was synchronisation issues; after investigation, I understood that the git repository was a by-product of transformation from CVS to Fossil source code manager, which is a tool more targeted at the purpose. So I switched the NetBSD online-reference repository from Git to Fossil (it even worked on my Windows machine, which is a net gain for me), and the synchronisations issues disappeared. Obtaining on-the-fly git copies of the Fossil repository is a task which is not cheap but is acceptable on modern hardware and connectivity if you do it from time to time. Alternatively, the couple of scripts which have been produced, most of them referenced in this thread, certainly could be adapted to Fossil replacing Git.

I did not investigate producing a MINIX3 port of Fossil client, but I do not believe it to be a big problem; start point is already there: http://pkgsrc.se/devel/fossil

P.S.: regarding the script just above: there is a "unsupported" feature of Fossil which could perhaps match git grep: test-grep