andsens / build-debian-cloud

This project has been superseeded by andsens/bootstrap-vz and is no longer maintained - Script to create Debian Squeeze & Wheezy Amazon Machine Images (AMIs) and Google Compute Engine images
Other
116 stars 43 forks source link

Grub install failing on Wheezy for VirtualBox/raw #116

Closed osallou closed 10 years ago

osallou commented 10 years ago

Grub install fails however on Wheezy with error:

Executing: /usr/sbin/chroot /target/af963f8b/root /usr/sbin/grub-install /dev/dm-0

/usr/sbin/grub-setup: warn: Your embedding area is unusually small. core.img won't fit in it..

/usr/sbin/grub-setup: error: embedding is not possible, but this is required for cross-disk install.

Debug log at: https://gist.github.com/osallou/7805556

My manifest: "volume": { "backing": "raw", "partitions": { "type": "mbr", "boot": { "size": 32, "filesystem": "ext2" }, "root": { "size": 1000, "filesystem": "ext4" }, "swap": {"size": 128} } },

I copied the VirtualBox example, just replacing vdi by raw. This may be a config issue with mbr boot partition but I see no documentation on this (and is set as this in virtualbox manifest)

osallou commented 10 years ago

I quick update to the code to increase mbr size from 1MiB to 2MiB in mbr.py in partitions and partitionmaps worked. I did not tested the image but the generation was fine. However I do not undestand why 2 is required.... and maybe another conf or day would require 1 or 3 .....

Maybe documenting this error case and making the mbr size configurable in the manifest would permit to avoid this use case ?

andsens commented 10 years ago

However I do not undestand why 2 is required.... and maybe another conf or day would require 1 or 3 .....

This definitely warrants some research. I'll look into it.

Maybe documenting this error case and making the mbr size configurable in the manifest would permit to avoid this use case ?

That would require the user to figure out what size is needed. If the user can figure it out, so can the bootstrapper. Putting it in the manifest is just a way of making it someone else's problem.

osallou commented 10 years ago

Le 8 déc. 2013 11:26, "Anders Ingemann" notifications@github.com a écrit :

However I do not undestand why 2 is required.... and maybe another conf or day would require 1 or 3 .....

This definitely warrants some research. I'll look into it.

Maybe documenting this error case and making the mbr size configurable in the manifest would permit to avoid this use case ?

That would require the user to figure out what size is needed. If the user can figure it out, so can the bootstrapper.

Some searches on google let me think that we can t know this size by advance. it seems that somehow size grows with raid.

Putting it in the manifest is just a way of making it someone else's problem.

I agree but at least it unlocks the issue

— Reply to this email directly or view it on GitHub.

osallou commented 10 years ago

I tried to boot the generated image (with increased /boot partition size) but it fails after grub:

fsch.ext4: Operation not permitted while trying to open /dev/vda2 You must have r/w access to the filesystem or be root fsck died with exit status 8

1) I wonder why it tries an fsck. I think there is a boot option in fstab that should be set to avoid fsck.

2) it looks strange that it mounts /dev/vda2 and not /dev/sda... Your partitioning indeed sometimes refer to vda.

osallou commented 10 years ago

/boot is empty. Maybe the partition size extension created issues in image creation (though no error is raised)

osallou commented 10 years ago

progressing in analysis. At first I have seen vda disks because virt-manager (on kvm) mounted the disk as a Virtio disk (don't know why it selected as default). But it made failures, which is fine as image is not created as a virtio disk. I forced virt-manager to mount the disk as a IDE disk and then I see write errors: Buffer I/O on device sda2 and may "failed command: WRITE MULTIPLE" and disks are still read-only. Issue is I do not see complete logs via console,and fo course no log is created as mounted read-only.

My disks are named sda... Parted shows /boot as primary partition with boot option and second partition as primary but no boot (and this is fine).

andsens commented 10 years ago

Hmm, I am beginning to suspect that either (a) the image hasn't been unmounted properly or (b) the disk is attached in a wrong way or read-only.
raw is the simplest format there is, there shouldn't be any issues with the disk.

osallou commented 10 years ago

Le 10 déc. 2013 19:42, "Anders Ingemann" notifications@github.com a écrit :

Hmm, I am beginning to suspect that either (a) the image hasn't been unmounted properly or (b) the disk is attached in a wrong way or read-only.

raw is the simplest format there is, there shouldn't be any issues with the disk. I will try to get the complete boot log from console to get more info

— Reply to this email directly or view it on GitHub.

osallou commented 10 years ago

I have tried to get full log but I can't record console output. I tried however to run it on an other server (kvm with wheezy, I used previously an Ubuntu maverick as kvm host) and I have a different behavior. I have no "read-only" error, but no login prompt on console.

1) I built the raw image on a Debian Wheezy not virtualized (I build the previous in a virtualbox) => same error on core.img size for /boot So it means this is not related to the build in VirtualBox versus a build on a native Debian Wheezy.

2) I launch with virt-manager the image grub menu etc... is fine boot starts fine the "checking filesystem fsch" is ok this time with a "/dev/sda1: clean" (but I see no sda2 messsage) DHCP fails but this is ok because I have no IP for this one. startpar failure on hostname.sh : this is ok too for cloud images last console message is : "INIT: no more processes left in this runlevel" but I have no login prompt in console.

So, we have 1 image, with 2 behaviors depending on who is running it over kvm (old Ubuntu or new Debian). Could be that generated image embeds features not supported by old libvirt/kvm capabilities. This could be an issue but I could deal with it. However, even on the recent one, we still have no prompt.

andsens commented 10 years ago

last console message is : "INIT: no more processes left in this runlevel" but I have no login prompt in console.

Yup, I know this one. It's fixed in the WIP branch: https://github.com/andsens/build-debian-cloud/commit/66a195aa52bd3f33d5864a1623cfdbe01ee1bbf1 Problem was that I disabled the gettty processes for ec2 (since there is no real TTY, ssh spawns its own). It was copied to virtualbox by mistake.

osallou commented 10 years ago

I just tried python-WIP branch and I have the parted error I talked about in the other bug when I tested on sid (but here i am on latest wheezy)

I have applied the patch locally on python branch and I have an image fine and running on a recent Wheezy (except mbr size error of course).

The status is: 1) remains the issue of the mbr size (1 is not fine for me,2 is, but don't know why) 2) generated image works fine on a recent wheezy+kvm but not on an old ubuntu+kvm (maverick). I don't think we should focus on that. It would be nice however to validate the image on a recent kvm + Fedora or other not to be stuck on the host system. I will see if I can test one. 3) patch 66a195a needs to be merged in python branch

Thanks for your support!

osallou commented 10 years ago

successfully tried the image on kvm + CentOS 5.8 ! So we can skip the 2) issue I think and focus on other points.

andsens commented 10 years ago

Wow, that is some thorough testing right there Olivier. Nice job! I will merge the WIP branch any moment, I only need to test ec2 bootstrapping.

Do you have some steps to reproduce 1)?

osallou commented 10 years ago

Well for 1) there is nothing special. I have a server with a debian wheezy updated with latest and specifying raw instead of vdi in virtualbox manifest, that's all. Could be related with what is installed on server, but I don't think so, this should be independant as everything is done within debootstrap.

Regarding python-WIP, as I said, I faced the issue with the parted error, though I did not face it with the python branch, so there may be some differences that introduce this error.

andsens commented 10 years ago

Regarding python-WIP, as I said, I faced the issue with the parted error, though I did not face it with the python branch, so there may be some differences that introduce this error.

I think you ran into a completely unrelated parted error on the WIP branch, when swap was enabled the partition ordering was screwed up (it's fixed and pushed now).

osallou commented 10 years ago

2013/12/11 Anders Ingemann notifications@github.com

Regarding python-WIP, as I said, I faced the issue with the parted error, though I did not face it with the python branch, so there may be some differences that introduce this error.

I think you ran into a completely unrelated parted error on the WIP branch, when swap was enabled the partition ordering was screwed up (it's fixed and pushed now).

okay, i will get latest WIP and test it. This would only let the mbr issue

— Reply to this email directly or view it on GitHubhttps://github.com/andsens/build-debian-cloud/issues/116#issuecomment-30343206 .

gpg key id: 4096R/326D8438 (keyring.debian.org)

Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438

andsens commented 10 years ago

Is this fixed now?

osallou commented 10 years ago

I was stuck with the "user package" issue. I gonna test it soon.

osallou commented 10 years ago

Just tested and still same error with mbr size:

Installing grub /usr/sbin/grub-setup: warn: Your embedding area is unusually small. core.img won't fit in it.. /usr/sbin/grub-setup: error: embedding is not possible, but this is required for cross-disk install. Command '/usr/sbin/chroot /target/73ac5ac4/root /usr/sbin/grub-install /dev/dm-0' returned non-zero exit status 1

andsens commented 10 years ago

I finally ran into the same error, meaning I could code, test and reproduce.
The Arch linux wiki states that the Post-MBR gap should be between 1 and 2 MB, so I simply increased it, problem solved :-)