dracutdevs / dracut

dracut the event driven initramfs infrastructure
https://github.com/dracutdevs/dracut/wiki
GNU General Public License v2.0
573 stars 396 forks source link

fix(zfcp_rules): use the right wwpn and lun vars and support new device format #2631

Open aafeijoo-suse opened 4 months ago

aafeijoo-suse commented 4 months ago

Commit c8e531239bf314ae532ca1bc820285250a3b35d7 introduced a regression that causes the parsed _wwpn and _lun from SCSI device nodes to not be passed to the create_udev_rule function.

Also, the format of these udev-created SCSI device nodes has changed [1], now it looks like: ccw-<device_bus_id>-fc-<wwpn>-lun-<lun>-part<n>

Fixes c8e531239bf314ae532ca1bc820285250a3b35d7

[1] https://www.ibm.com/docs/en/linux-on-systems?topic=nodes-udev-created

Checklist

aafeijoo-suse commented 4 months ago

Similar to #2630, this file will be removed by #2534. Now the zfcp rules would be created by https://github.com/ibm-s390-linux/s390-tools/blob/master/zdev/dracut/95zdev/parse-zfcp.sh, although I don't see support for root= there (only for rd.zfcp, but maybe I'm overlooking something).

steffen-maier commented 4 months ago

Similar to #2630, this file will be removed by #2534. Now the zfcp rules would be created by https://github.com/ibm-s390-linux/s390-tools/blob/master/zdev/dracut/95zdev/parse-zfcp.sh, although I don't see support for root= there (only for rd.zfcp, but maybe I'm overlooking something).

It could very well be that I'm overlooking something.

My reasoning for not carrying over the code regarding root=, resume=, and using information about the boot (ipl) device was: Users need to use (dm) multipathing. https://www.ibm.com/docs/en/linux-on-systems?topic=know-scsi-disk-device-nodes#scsi_nodes__title__2 https://public.dhe.ibm.com/software/dw/linux390/lvc/zFCP_Best_Practices-BB-Webcast_201805.pdf#page=15 If root= or resume= would contain a (persistent) storage device name that happens to represent one single (z)FCP path (such as a by-path symlink), it would be a wrong setup. The initial boot (IPL) indeed does happen over one single path. However, I would expect users to expect redundancy even in early initrd, which means they need more than the one root-fs path activated; plus, in general, the boot record could be on a different disk than the root-fs so the boot device does not necessarily provide information about paths to the root-fs. So initrd needs a way to know all paths to all dependencies to mount the root-fs.

What do you think?

aafeijoo-suse commented 4 months ago

It could very well be that I'm overlooking something.

My reasoning for not carrying over the code regarding root=, resume=, and using information about the boot (ipl) device was: Users need to use (dm) multipathing. https://www.ibm.com/docs/en/linux-on-systems?topic=know-scsi-disk-device-nodes#scsi_nodes__title__2 https://public.dhe.ibm.com/software/dw/linux390/lvc/zFCP_Best_Practices-BB-Webcast_201805.pdf#page=15 If root= or resume= would contain a (persistent) storage device name that happens to represent one single (z)FCP path (such as a by-path symlink), it would be a wrong setup. The initial boot (IPL) indeed does happen over one single path. However, I would expect users to expect redundancy even in early initrd, which means they need more than the one root-fs path activated; plus, in general, the boot record could be on a different disk than the root-fs so the boot device does not necessarily provide information about paths to the root-fs. So initrd needs a way to know all paths to all dependencies to mount the root-fs.

* In hostonly mode this information is stored inside initrd (this seems the default for typical distros). [ibm-s390-linux/s390-tools@f2c82bf](https://github.com/ibm-s390-linux/s390-tools/commit/f2c82bf2b5043313174d13ca464ee670b9a19681) does it in a distro-independent way (`chzdev --export`). This and 95zfcp_rules/module-setup.sh make use of the dracut helper fuction `for_each_host_dev_and_slaves_all` to walk the root-fs dependency graph. [ibm-s390-linux/s390-tools@06a30ae](https://github.com/ibm-s390-linux/s390-tools/commit/06a30ae529a5d6ad2369ed81da056bf3a6147bb6) extends the former commit further (note that the support for hostonly-cmdline is somewhat duplicate because the former commit already stores the device configuration with chzdev's persistent config mechanism (currently happens to be udev rules) in initrd, so rd.zfcp entries in a stored cmdline are not necessary, but don' hurt either because parsing of rd.zfcp also uses chzdev and so we won't have duplicate device configuration because chzdev merges input for the same devices).

* Without hostonly mode, users could specify multiple rd.zfcp= on the boot cmdline.

What do you think?

Thanks for this extensive explanation, definitely it was me who was overlooking the required expertise with this kind of setups.

The bugs of this PR and #2630 appeared last week while we were testing our latest distro, but they are not fixing critical errors, because nowadays the dracut modules shipped with s390-tools are doing the most important job. The problem is these old upstream s390 modules are being included by default and show confusing errors in the system log.

Therefore, upstream needs to move to #2534 soon and remove obsolete s390 modules.

steffen-maier commented 4 months ago

The bugs of this PR and #2630 appeared last week while we were testing our latest distro, but they are not fixing critical errors, because nowadays the dracut modules shipped with s390-tools are doing the most important job. The problem is these old upstream s390 modules are being included by default and show confusing errors in the system log.

Therefore, upstream needs to move to #2534 soon and remove obsolete s390 modules.

Any idea how to make progress on #2534 and get it integrated/merged timely?

Currently, I see 136 open pull requests in dracut, including your #2630 and #2631.

I'm not familiar with the requirements to merge dracut PRs. For #2534, it states "At least 2 approving reviews are required by reviewers with write access". Harald, Jóhann, and Lukáš are required reviewers. Dan kindly provided review with approved flag. But I'm not sure if that counts towards the required 2 reviews, because he might not have write access. Github doc sounds like it won't count because Dan's review approval has grey but not green color.

Maybe it would help, if you (@aafeijoo-suse) and Thomas (@tblume) could also kindly provide a review (approval) for #2534?

Here are some examples of what's in it for you (see also SUSE bug 1210597 comment from 2023-10-13):

aafeijoo-suse commented 4 months ago

The bugs of this PR and #2630 appeared last week while we were testing our latest distro, but they are not fixing critical errors, because nowadays the dracut modules shipped with s390-tools are doing the most important job. The problem is these old upstream s390 modules are being included by default and show confusing errors in the system log. Therefore, upstream needs to move to #2534 soon and remove obsolete s390 modules.

Any idea how to make progress on #2534 and get it integrated/merged timely?

Currently, I see 136 open pull requests in dracut, including your #2630 and #2631.

Ah don't bother about these 2 PRs, both were submitted for "public knowledge" only. The most important and the one we want to address is #2534.

I'm not familiar with the requirements to merge dracut PRs. For #2534, it states "At least 2 approving reviews are required by reviewers with write access". Harald, Jóhann, and Lukáš are required reviewers. Dan kindly provided review with approved flag. But I'm not sure if that counts towards the required 2 reviews, because he might not have write access. Github doc sounds like it won't count because Dan's review approval has grey but not green color.

Maybe it would help, if you (@aafeijoo-suse) and Thomas (@tblume) could also kindly provide a review (approval) for #2534?

As you noticed, dracut upstream is just a "bulletin board" nowadays. The admins and and the distros have abandoned the project (sorry if this statement hurts any feelings, but it's what it's). The remaining non-admin people with write permissions are Thomas and me (both from SUSE). If upstream only consists of SUSE employees with limited rights, it's not a real upstream. So, we prefer to be focused on our downstream GitHub fork and only merge things there (if they add some value to us, of course).

Here are some examples of what's in it for you (see also SUSE bug 1210597 comment from 2023-10-13):

Definitely we will review your PR, both #2630 and #2631 are just an example of the bug reports we are receiving because dracut ships obsolete s390 modules (and documentation as well). Over the past few months, our data center moved to a new location and this caused a disruption to our available resources. As soon as I have an instance to test it on real hw along with the latest version of s390-tools, I'll give you a review. Thanks for your work here!

As a first feedback, it would be great if you could take a look at the dracut man pages and remove the obsolete documentation in your PR.

steffen-maier commented 4 months ago

Hi Antonio,

thanks for the information. Much appreciated. I wasn't aware.

As soon as I have an instance to test it on real hw along with the latest version of s390-tools, I'll give you a review.

FYI, I successfully function tested https://github.com/ibm-s390-linux/s390-tools/pull/158 with #2534 as well as with https://github.com/openSUSE/kdump/pull/40 on SLES15.

Definitely we will review your PR, both #2630 and #2631 are just an example of the bug reports we are receiving because dracut ships obsolete s390 modules (and documentation as well).

As a first feedback, it would be great if you could take a look at the dracut man pages and remove the obsolete documentation in your PR.

Could you please elaborate a bit more on the obsolete documentation?

https://github.com/dracutdevs/dracut/commit/a52b248cc55d0a8a45fe0fbca5b0082b7fa3da49 updates (and keeps) the documentation of rd.znet (and rd.znet_ifname).

I think we need a well-known central documentation place, which is why I intentionally kept rd.dasd and rd.zfcp in dracut's man/dracut.cmdline.7.asc. https://github.com/dracutdevs/dracut/pull/2534/commits/ff81fa5b0dbad0adf60ad680e1b5402109157356, https://github.com/dracutdevs/dracut/pull/2534/commits/963385745d93e934d20d8835548afed5c70d15e8, https://github.com/dracutdevs/dracut/pull/2534/commits/e229f0208acfaa30f58996335a2fb26f90236c50, https://github.com/dracutdevs/dracut/pull/2534/commits/012b26a2f87fb1b5a83b66b47f55984ff84ec289, and https://github.com/dracutdevs/dracut/pull/2534/commits/48cff12560710847f7a37e2147d554b95a1bbe84, contain the following commit description part:

Even though this removes one implementation of parsing {rd.zfcp,rd.dasd} in dracut, above s390-tools change introduces another implementation of parsing the exact same {rd.zfcp,rd.dasd} syntax. Therefore, it would be good to keep the documentation in man/dracut.cmdline.7 of dracut as one central place describing all s390 device types that dracut handles.

s390-tools zdev (incl. its dracut modules part) is packaged such that it is part of any minimal distro installation on s390. Hence, dracut on s390 also has the zdev dracut modules available. If necessary and not done yet, dracut packaged for s390 could have an explicit Requires dependency on the (sub)package of s390-tools meant to be part of any minimal installation. Therefore, I think that we can still consider the functionality of rd.dasd and rd.zfcp (have always been s390-specific) being part of core dracut on s390.

I fear users would be confused if the long-standing documentation entries would vanish from dracut. An alternative would be to move the two dracut cmdline option descriptions to something like the deprecated section (does not really fit there). Or just let them where they are and replace the description text with a reference elsewhere, but we don't have this "elsewhere" currently in s390-tools, plus I would not like it as user / reader to all of a sudden have to read multiple independent docs at different places. @oberpar, @vneethv, we had briefly touched this documentation topic a while ago, what do you think?

rd.zfcp.conf will only become obsolete / deprecated (without functionality as it's no longer necessary) with https://github.com/dracutdevs/dracut/pull/2632/commits/f98d41eee27ac85f07daebf22ae266c39f955c70 from https://github.com/dracutdevs/dracut/pull/2632. I'm going to update man/dracut.cmdline.7.asc in that commit.

aafeijoo-suse commented 4 months ago

I fear users would be confused if the long-standing documentation entries would vanish from dracut.

Yes, that's right. But now dracut won't be responsible for handling these command line options anymore. Maybe we could keep the entries but add "this is not supported by dracut anymore" message or something like that, and a reference to the s390-tools man page that describes them, which is the place where this is handled now.

johannbg commented 3 months ago

As you noticed, dracut upstream is just a "bulletin board" nowadays. The admins and and the distros have abandoned the project (sorry if this statement hurts any feelings, but it's what it's). The remaining non-admin people with write permissions are Thomas and me (both from SUSE). If upstream only consists of SUSE employees with limited rights, it's not a real upstream. So, we prefer to be focused on our downstream GitHub fork and only merge things there (if they add some value to us, of course).

@aafeijoo-suse not abandoned, been busy at dayjob for past 2 years not particular problem at the moment since I quit that job few days back due to me being pretty much overworked, I guess people these days would call "burned out".

Beside me and Harald holding ownership over the project(s), you and I have maintainership permission over dracut repo, tblume has write access along with 2 other RH employees which I trust but are not particular active in the project outside of their line of work inside RH. No one from RH/BRNO will be granted elevated access after the access/release shenanigans. Any employee from RH will start at the same level as any other contributor to the project that is not employed there as in on ground 0, having to work it's way up just like anyone else. The only way that will be changed is if I will have direct discussion with someone I'm familiar with from RH/Raleigh

Anyways back to the s390-tools PR's what's the current status of that and in which direction are the upstream 390 taking with regards to init systems, is it being generic or moving towards systemd only?

That said I'm not particularly fond of "partial" upstream, with that I mean we should not support some aspects of 390 here upstream while other parts of it is being maintained in 390 upstream repo which will double the load on everyone involved and quadruple it for anykind of support issues.

dtardon commented 3 months ago

Beside me and Harald holding ownership over the project(s),

You mean you're holding ownership. Harald might still retain his, but he's not been involved for years.

you and I have maintainership permission over dracut repo, tblume has write access along with 2 other RH employees which I trust but are not particular active in the project outside of their line of work inside RH

Inactive project members might just as well not exist at all. Any outside observer can see that the project is dysfunctional. Issues are ignored, PRs aren't being reviewed (just 2 trivial ones have been merged this year), questions aren't being answered. That PRs are being closed as "stale" by a bot after months of waiting for review must feel like a ridicule to the submitters. A group of downstream maintainers just created a new release outside of the project (Fedora announcement). IOW, the way you--as the project owner--are managing the project is not working. The sooner you admit that the better for dracut.

No one from RH/BRNO will be granted elevated access after the access/release shenanigans. Any employee from RH will start at the same level as any other contributor to the project that is not employed there as in on ground 0, having to work it's way up just like anyone else.

I'm at loss at what you're expecting here. There isn't any queue of people eager to contribute to dracut inside Red Hat. The downstream maintainers are still the same people you kicked off.

stale[bot] commented 2 months ago

This issue is being marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. If this is still an issue in the latest release of Dracut and you would like to keep it open please comment on this issue within the next 7 days. Thank you for your contributions.