LINBIT / linstor-gateway

Manages Highly-Available iSCSI targets, NVMe-oF targets, and NFS exports via LINSTOR
GNU General Public License v3.0
28 stars 6 forks source link

nvme create defaults to port_id=0, doesn't link in new subsystems. #12

Open Smithx10 opened 2 years ago

Smithx10 commented 2 years ago

Currently all nvme create's will not populate past the first.

The reason being is that the resource-agent responsible for creating the port and linking the subsystems to that port never will reach its code due to "nvmet_port_monitor()".
https://github.com/ClusterLabs/resource-agents/blob/main/heartbeat/nvmet-port#L137

The health check only checks the existence of the directory and if so, doesn't iterate over the nqns: https://github.com/ClusterLabs/resource-agents/blob/main/heartbeat/nvmet-port#L148

nvmet_port_start() {
    nvmet_port_monitor
    if [ $? =  $OCF_SUCCESS ]; then
        return $OCF_SUCCESS
    fi

    mkdir ${portdir}
    echo ${OCF_RESKEY_addr} > ${portdir}/addr_traddr
    echo ${OCF_RESKEY_type} > ${portdir}/addr_trtype
    echo ${OCF_RESKEY_svcid} > ${portdir}/addr_trsvcid
    echo ${OCF_RESKEY_addr_fam} > ${portdir}/addr_adrfam

    for subsystem in ${OCF_RESKEY_nqns}; do
        ln -s /sys/kernel/config/nvmet/subsystems/${subsystem} \
           ${portdir}/subsystems/${subsystem}
    done

    nvmet_port_monitor
}
nvmet_port_monitor() {
    [ -d ${portdir} ] || return $OCF_NOT_RUNNING
    return $OCF_SUCCESS
}

I noticed that we don't populate port_id in "/etc/drbd-reactor.d/linstor-gateway-nvmeof-$name.toml"

We only populate:

"ocf:heartbeat:nvmet-port port addr=10.91.230.214 nqns=linbit:nvme:zoo type=tcp"

Desired behavior? If the user provides a different service address it probably should just automatically take the next available port. This probably should be fixed in linstor-gateway.

If the user provides the same service address it should link it in. This probably should be fixed in resource-agents.

Potentially port_id could be exposed to a user, but probably not necessary.

chrboe commented 2 years ago

Thanks for the report.

If the user provides a different service address it probably should just automatically take the next available port

Hm, yes. We would have to read back all the already created targets and check for the highest port_id... Probably not impossible, but I don't think we have precedent for that kind of logic yet. I will look at it.

If the user provides the same service address it should link it in. This probably should be fixed in resource-agents.

I don't think I fully understand this point. Right now I guess it would create a new portdir and symlink the subsystem in there. Does the backend not accept this? How would we fix this in the resource agents?

I guess if anything linstor-gateway should look up whether or not there is already a target with the same addr and assign the same port_id if there is...

Smithx10 commented 2 years ago

Sorry if I wasn't clear.

I guess if anything linstor-gateway should look up whether or not there is already a target with the same addr and assign the same port_id if there is...

Even if we use the same port_id for the same service_address with the current nvmet-port heartbeat code we will never symlink in the subsystem.

nvmet_port_start() runs nvmet_port_monitor which only checks if the $portdir exists, which it will since we created a port prior and will return 0 and never hit the following:

    for subsystem in ${OCF_RESKEY_nqns}; do
        ln -s /sys/kernel/config/nvmet/subsystems/${subsystem} \
           ${portdir}/subsystems/${subsystem}
    done

the healthcheck

nvmet_port_monitor() {
    [ -d ${portdir} ] || return $OCF_NOT_RUNNING
    return $OCF_SUCCESS
}

Perhaps we should run loop where we link even if the portdir exists.

Smithx10 commented 2 years ago

After going through a PoC implementation of this behavior, I discovered that when you have 1 VIP with 4 subsystems, it's possible for reactor to promote the VIP on separate Primaries.

For example: nvme create -r nvme_group linbit:nvme:demo0 10.91.230.214/32 10G nvme create -r nvme_group linbit:nvme:demo1 10.91.230.214/32 10G nvme create -r nvme_group linbit:nvme:demo2 10.91.230.214/32 10G

Can result with demo0 and demo1 on NodeA, and demo2 on NodeB both with the VIP 10.91.230.214.

Is there a way to make sure that Reactor can co-locate things like this?

Perhaps preferred-nodes? https://github.com/LINBIT/drbd-reactor/blob/master/doc/promoter.md#preferred-nodes