Closed bengland2 closed 2 years ago
@mulbc FYI
In our documents :
vm_image: Whether to use a pre-defined VM image with pre-installed requirements. Necessary for disconnected installs.
Note: You can use my fedora image here: quay.io/mulbc/fed-fio
Note: Only applies when kind is set to vm
However, you mention that the vm_image is different for the server/client (this seems like an issue)...
I wonder if you would of set image
to quay.io/mulbc/fed-fio
if that would of fixed your issue... Effectively pinning the server and client to the same container image?
quay.io/mulbc/fed-fio was a VM image, client is a pod (container) image, so they can't be the same. quay.io/mulbc/fed-fio is exactly what I set it to in the beginning when I got the error. It used to work, now it doesn't, his image didn't change, so it must have been the version of fio used in quay.io/cloud-bulldozer/fio image.
quay.io/mulbc/fed-fio
quay.io/mulbc/fed-fio was a VM image, client is a pod (container) image, so they can't be the same. quay.io/mulbc/fed-fio is exactly what I set it to in the beginning when I got the error. It used to work, now it doesn't, his image didn't change, so it must have been the version of fio used in quay.io/cloud-bulldozer/fio image.
ack - now I see that, I didn't actually look at Chris's image. If we knew which FIO he was using, we might be able to pin that, but we need a longer term solution...
The fio VM image is just an example... Ben pinged me about the problem yesterday... he got my image creation script, if he is able to fix it by pinning to a specific fio version, I'm happy to update quay.io/mulbc/fed-fio
to work again ;)
@mulbc exactly, not Chris's fault, but I would suggest we include VM image creation somehow as part of producing the benchmark image, so that they continue to be synch'ed in the future. And I did get it working, it was not hard, it's documented here, just needs some automation. Sorry, can't do it right now but that's why I wrote the issue, so this wouldn't get lost.
ack!
I was under the impression this was being maintained, since it was set as our default for fio_vm CR
Checking in on this @bengland2
haven't gotten around to submitting a PR yet. @mulbc if you get there first fine with me, I should be able to take a look at a fix within 2 weeks.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
I discussed with Russ, I think we need some kind of CI that has both OCS and CNV in it so that we can test things like this that are going to be used in the field with benchmark-operator (example: Goldman, Morgan-Stanley). It's non-trivial to implement from a dependency standpoint, but if we want to make benchmark-operator usable by a wider audience then that's probably what we have to do. Think of it as "productizing" benchmark-operator.
@bengland2 how do we proceed on this? Is this a task that you track in your team's backlog?
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
@mulbc latest discussion I heard with CNV P&S team (Jen's idea) was that we would attempt to make VMs run podman to invoke the fio benchmark-operator image that is used by openshift, so as to avoid maintaining multiple fio images, 1 for pods and 1 for VMs. Not sure if this is feasible but I think in theory it is possible to do - instead of having /mnt/pvc be configured by OpenShift, the VM script would have to make /mnt/pvc be bound to an RBD device that the script created. This could be done using podman -v /mnt/pvc:/mnt/rbd-container-X .
also Jen Abrams ( @jeniferh ) suggested --net host , the host networking would allow the image to connect to redis and elasticsearch outside the VM, assuming the VM's firewall lets it through.
So we would still need some benchmark-operator magic to start the VMs and invoke the image from within the VMs though, but it's not a different image at that point, it's just a different use of the same image, and we don't need a VM image taylored for fio, just a RHEL VM image with podman in it. This approach might make it easier to get other benchmark-operator benchmarks working with CNV also, right?
also Jen Abrams ( @jeniferh ) suggested --net host , the host networking would allow the image to connect to redis and elasticsearch outside the VM, assuming the VM's firewall lets it through.
I don't understand this point - why do you think the VM could not connect to elasticsearch right now? Using the regular SDN does not prevent the VM from talking to anything in the cluster (last I checked)
If I understand your suggestion correctly, then you want to have a container, running a VM, running a container? :D This might work, but is maintaining a VM image with fio so much work? In other words - wouldn't it be the same amount of work to have a VM image with podman?
@mulbc, thank you for creating the capability to run fio inside CNV VMs in the first place. The question is how to make this more maintainable and easy to do going forward. In response:
You are correct, the VM can connect to elasticsearch right now, but the run_snafu.py inside a container inside the VM might not be able to do that unless we set up networking for the container to allow it. Hence the suggestion for podman --net host.
the VM image with podman could be set up to run any of the the benchmark-operator benchmarks, whereas the VM image with fio can only run that one benchmark. And the VM image with podman will not need to be updated when a new version of a new benchmark comes out.
right, CNV VMs are just pods that run qemu-kvm, and inside the VM we're turning around and running a container, so it is a bit comical. But it all comes down to maintainability and extending benchmark-operator to support CNV better.
@ebattat what do you think? @jtaleric ?
Yes this idea of using the same workload binaries provided in the container image that a 'kind: pod' run would use is something I am starting to work on. Not sure yet if we can use a chroot-based solution or will actually need to run from w/in a carefully crafted container inside the VM, but the idea is that it will reduce maintenance work for VM images for each workload as @bengland2 mentioned and we are reusing the exact same bits as pod workloads which is beneficial for testing purposes and performance comparisons.
I think I understand this now - yes that might indeed cut down on the maintenance!
I still think that you wouldn't need the host networking for podman to connect to the elasticsearch, but we can check.
One thing we should make sure is that we forward the disk device to the inside podman container instead of mounting and forwarding that.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
the fio benchmark with kind: vm will not work anymore. This is because the fio versions of the client and server must match EXACTLY, but the fio benchmark of the client is fio-3.19-3 whereas the fio benchmark of the server container image was frozen in time at quay.io/mulbc/fed-fio , so you'd see errors like this in the client log:
I fixed this by generating my own fio image using Chris Blum's handy script and modifying it slightly. The version I'm using right now is at
And I built the fio image by logging into one of the VMs :
and doing this:
and saved the fio program here.
Eventually someone should make a PR and incorporate all this into the fio benchmark, but for now this is a usable workaround, I think.