OpenFabrics / fsdp_docs

Other
2 stars 3 forks source link

Need to update opafm.xml file on builder-00 #114

Closed dledford closed 1 year ago

dledford commented 1 year ago

While debugging some performance issues, I've been in contact with Cornelus Networks and they have some requested changes to the /etc/opa-fm/opafm.xml file. In my home dir on builder-00 there is an opafm.xml file with the changes already made. Also, for any opa interfaces on builder-00, the interface configuration should NOT use connected mode, so either the /etc/sysconfig/network-scripts/ifcfg-hfi1_opa0* files should not have any MTU or CONNECTED_MODE lines, or the lines should read CONNECTED_MODE=no and MTU=10236. Note: these settings won't work until after the opafm.xml file has been updated and opafm has been restarted.

lylavoie commented 1 year ago

Changes were made to the opafm.xml and interface files, and applied.

However, there is another error in the opafm.xml file, pertaining to the multicast group settings per PKey (line 1410 of the config). If I disable the create for each group, services starts fine. If I enable them, it lists an error for duplicate MGID for the first FM. I'm guessing it's because the groups are not being "assigned" to each virtual fabric manager, but I don't see how that is done, unless these should all be "pulled down" in the config to the FM instances below.

@dledford let me know if you have any recommendations.

dledford commented 1 year ago

@lylavoie Is there still something needed here? If so, find the MulticastGroup 0 definition and set the Created flag to 0. That probably will clear any errors.

lylavoie commented 1 year ago

@dledford that looks to have fixed it, config is running with all original pre configured groups and your changes. And changes have been applied to the networking interfaces as well.