cloudius-systems / osv

OSv, a new operating system for the cloud.
osv.io
Other
4.11k stars 605 forks source link

Uploading application to base image fails #860

Closed rowlandwatkins closed 7 years ago

rowlandwatkins commented 7 years ago

Hi folks,

Got an interesting error when using capstan to add my application into a master-branch-built base image: [[[ OSv v0.24-327-gd6caf00 Solaris: NOTICE: Cannot find the pool label for '/dev/vblk0.1' eth0: 192.168.122.15 Waiting for connection from host... warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] Uploading files... target/pack/lib/joda-time-2.9.4.jar --> /lib/joda-time-2.9.4.jar target/pack/lib/lucene-sandbox-4.5.1.jar --> /lib/lucene-sandbox-4.5.1.jar target/pack/lib/play-datacommons_2.11-2.5.9.jar --> /lib/play-datacommons_2.11-2.5.9.jar Adding /lib/joda-time-2.9.4.jar... target/pack/lib/scala-parser-combinators_2.11-1.0.4.jar --> /lib/scala-parser-combinators_2.11-1.0.4.jar terminate called after throwing an instance of 'std::system_error' what(): chmod: Operation not permitted Aborted

[backtrace] 0x00000000004875ac <__gnu_cxx::__verbose_terminate_handler()+364> 0x0000000000ae44df <???+11420895> ]]]

In Jenkins the upload hangs indefinitely. The error is quite recent - I do a daily build from github, then automatically use the resulting successful build down the CI chain. I'm rolling back to a earlier base image for the time being. Not sure if that Solaris warning is related or not!

Cheers,

Rowland

nyh commented 7 years ago

Interesting, I don't remember any recent change in chmod() or filesystem. Do you get this error every time now, or it just happened once?

It's curious what could cause chmod() to return EPERM - it's fine to chmod() an unwritable file or a file whose parent is unwritable, and if the parent is unsearchable, the error would be EACCES, not EPERM. It would be good to debug or add printouts to zfs_setattr() to see where this error is coming from.

This "Solaris" message is coming from OSv? I also haven't seen it before and don't know what it means. It's not nice that OSv prints messages with the label "Solaris" :-)

rowlandwatkins commented 7 years ago

@nyh Yeah this failure happens every attempt on recent OSv builds. Indeed, the Solaris message is coming from OSv - all showed up after I add verbose to Capstan.

Failures appeared to have started during 27th Feb, but I'd been having some issues with building OSv up to that point due to the refactoring that had occurred (end of last year?). This may be related. I've also had to re-apply my old patches for detecting the underlying hypervisor, but I'm not sure how that would affect the filesystem.

In any case, I'll back test to see what else might be causing problems and finally post my patches to the mailing list for review.

Cheers,

Rowland

rowlandwatkins commented 7 years ago

@nyh OK, looks like this is my crufty patches causing the issue, although it's not that clear why. I had to rebase them since there had been some refactoring to the Java mgt module - so it might be that I don't totally understand how to re-integrate...

I'll have another stab and get back to you.

Cheers,

Rowland

rowlandwatkins commented 7 years ago

This doesn't appear to be an issue with recent builds, so will close for now