dd010101 / vyos-jenkins

How build packages from VyOS stable branches (1.3 equuleus/1.4 sagitta) with Jenkins (and then build ISO from them)
87 stars 28 forks source link

equuleus / wide-dhcpv6 build error #23

Closed Neboer closed 3 months ago

Neboer commented 3 months ago

Thank you for the great work and I managed to build a sagitta repo today in a virtual machine running debian bookworm. The build is almost success, all the Jenkins tasks are successful except a single task named equuleus/wide-dhcpv6 gave an error. The log is here.

Since I haven't built the iso image because another issue, I haven't try to assemble an iso image to see if it will work even without the successful build of the package.

dd010101 commented 3 months ago

The wide-dhcpv6 for equuleus fails for me too and then it works. I don't understand why it does work sometimes, it fails on compilation but the code is the same or should be so I don't get it.

You can try to run it until it passes as temporary workaround I guess...

Neboer commented 3 months ago

Okey I see. Please keep me update with this issue.

dd010101 commented 3 months ago

There is issue where first make breaks second make. The first make builds cfparse.y and the second does only sometimes and that's why then the second make crashes since it doesn't have cfparse.y built. I guess the reason why only sometimes is some concurrency/race condition? I'm not sure why there are two makes who execute the same targets, seems redundant to me. I don't understand why it's happening, but I found workaround by using -B|--always-make to force all makes to execute build for cfparse.y and now I can build the wide-dhcpv6 10 times in row without failure - before -B|--always-make this would fail more than it succeeded. This is not really fix but fully working workaround so good enough for me.

I found additional patch from Fedora to fix some parallel issue with cfparse.y build but this doesn't fix this issue. The issue persist even with the patch applied.

If anyone understands this more and does know how to fix this properly then let me know.

This requires fork, thus please use https://github.com/dd010101/vyos-build.git GIT repository for wide-dhcpv6 and let me know if this fixes the issue for you.

GurliGebis commented 3 months ago

My guess would be that it is calling the makefile with multiple jobs as a parameter, so depending on which order things are compile, it breaks. Just guessing, but it would explain the randomness of it all. Can you try and change it, so the makefile is called with -j1?

dd010101 commented 3 months ago

No, -j1 doesn't fix this I tried. It's more like opposite when they DO run in parallel then it succeeds? But those are independent makes how they could run in parallel? My make knowledge is non-existent so I don't understand how make inside make behalves. If first make builds then the second make doesn't care since "it's already done" thus it skips the build and this results in incomplete file being compiled thus lot of undefined errors, that's how I understand this. The workaround via -B|--always-make is fine since the build runs in clean environment every time anyway so there is not much reason for make to build only changed files since they are all new every time. It also doesn't change the build time. Is there chance that -B|--always-make would fix concurrency race condition? I don't think so, that's why I think the make after make is the issue not threading/concurrency.

Now it works fine:

GurliGebis commented 3 months ago

Not an expert on it either. But if that fixes the issue, then great 😃

dd010101 commented 3 months ago

If it does - you tell me! It does for me.

GurliGebis commented 3 months ago

We'll see when I get to my build script 🙂

Neboer commented 3 months ago

Hey so what should I do to make it pass the build in Jenkins?

Crushable1278 commented 3 months ago

Hey so what should I do to make it pass the build in Jenkins?

Did you attempt to pull in the fix and run that?

@dd010101 Did you guys attempt to use a simple sleep 1? High level: 1) git clone 2) sleep 1 3) build command

Works every time for me.

dd010101 commented 3 months ago

Hey so what should I do to make it pass the build in Jenkins?

You need to replace GIT URL in Multibranch Pipeline for wide-dhcpv6 and then run build again.

Go to Jenkins, find the wide-dhcpv6 job/package/pipeline, then go to Configure action, replace the GIT URL from https://github.com/vyos/vyos-build.git to https://github.com/dd010101/vyos-build.git. Save. Then select equuleus and run the build via Build now action.

You can also obtain fresh copy and use ./seed-jobs.sh create this will also update definition of all pipelines. Then you can use Build now action inside Jenkins for specific job/branch to build only what you want not everything.

dd010101 commented 3 months ago

@dd010101 Did you guys attempt to use a simple sleep 1?

This example is bit different. The GIT clone is done via Jenkins. Then sometimes later the Jenkins executes the build.sh, the build.sh runs dpkg-buildpackage, the dpkg-buildpackage runs inner make this passed and then another inner make runs and this fails sometimes without the -B|--always-make. That's how I understand this at least. You thinking about adding extra delay between the two inner make's would help? I didn't try that.

Crushable1278 commented 3 months ago

Well, it does seem a bit different. What's the failure in the logs that's seen?

My script simply does

git clone https://salsa.debian.org/debian/wide-dhcpv6
sleep 1
./build-wide.sh

And this works with Equuleus. Perhaps something to do with the Jenkins environment.

I wonder if you could simply change

sed -i -E 's/\$\(MAKE\) -C/\$\(MAKE\) -B -C/' debian/rules

to

sleep 1

When not run with sleep, my errors look like:

cftoken.l: In function ‘yylex’:
cftoken.l:147:35: error: ‘PROFILE’ undeclared (first use in this function); did you mean ‘PF_FILE’?
 <S_PROFILE>{string} {
                                   ^      
                                   PF_FILE
cftoken.l:147:35: note: each undeclared identifier is reported only once for each function it appears in
cftoken.l:152:10: error: ‘PROFILENAME’ undeclared (first use in this function); did you mean ‘POOLNAME’?
 }
          ^          
          POOLNAME
cftoken.l:248:18: error: ‘IFID’ undeclared (first use in this function); did you mean ‘IAID’?
 <S_CNF>ifid-random { DECHO; return (IFID_RAND); }
                  ^~~~
                  IAID
cftoken.l:249:18: error: ‘IFID_RAND’ undeclared (first use in this function)

                  ^        
cftoken.l:255:31: error: ‘CLIENT_ID’ undeclared (first use in this function)
 <S_CID>{duid} {
                               ^        
cftoken.l:260:10: error: ‘CLIENT_ID_DUID’ undeclared (first use in this function)
 }
          ^             
make[2]: *** [<builtin>: cftoken.o] Error 1
make[2]: Leaving directory '/vyos/packages/wide-dhcpv6/wide-dhcpv6/build'
make[1]: *** [debian/rules:27: override_dh_auto_build] Error 2
make[1]: Leaving directory '/vyos/packages/wide-dhcpv6/wide-dhcpv6'
make: *** [debian/rules:12: build] Error 2
dpkg-buildpackage: error: debian/rules build subprocess returned exit status 2

Perhaps I see a different error.

dd010101 commented 3 months ago

I did found out the why the --always-make and sleep both work.

There is chicken and egg issue in the Makefile - the make will never build the cfparse.y because the target file y.tab.h already exists in the repository and the source file never changes because this is clean run, thus both files have identical or opposite timestamp assigned by git clone. This makes the make think the target is up to date, but the commited y.tab.h isn't up to date - that would normally break the build outright - expect in this specific case it does only sometimes because the dpkg-buildpackage applies various patches on top of the repository and one patch by luck touches the source file. Thus there is condition if the run is fast enough, it will apply the patches too quickly, so the timestamp from make's points of view doesn't change from the git clone timestamp, thus target is supposedly up to date. If the run is slower or you add delay like sleep then the patches will make sure the source file has newer timestamp and thus the target will be rebuilt. The --always-make works as well, since then it doesn't matter what are the timestamps.

Thus the underlying issue is that the target file y.tab.h is tracked in the GIT repository of Debian but it should be built by the bison -y -d cfparse.y during build and thus not tracked by the GIT. That's why the correct workaround is to delete y.tab.h before build to ensure the state is clean, ideally this file should be excluded by .gitignore and deleted from the Debian repository.

Today I couldn't reproduce the error with the original script since my system was loaded by other tasks and thus it was too slow to apply patches quickly enough 😄.

Only equuleus has the symptoms because sagitta runs extra commands before the dpkg-buildpackage applies patches thus this acts as delay to move the modification time forward triggered by patches.

@Crushable1278

Perhaps I see a different error.

It's the same error, bash and Jenkins fail identically.

Did you know this was the issue and that's why you did add sleep or you added sleep just because you felt like it and it worked?

GurliGebis commented 3 months ago

@dd010101 can you submit a PR upstream? might make sense to try and push things like that to VyOS 😊

dd010101 commented 3 months ago

I doubt they would approve my account since they deleted me on the forum 😄.