Open rahullepakshi opened 8 months ago
@oritwas @rahullepakshi I believe that ceph adm should not start the GW in case that the number of hugepages allocated is less than what we try to allocate. Currently in ceph\src\cephadm\cephadmlib\daemons\nvmeof.py we ask to allocate 4096 hugepages.
@caroav but as you can see in above logs, at v1.0.0, we are requesting only 2048, right?
INFO:nvmeof:NVMeoF gateway Git repository: https://github.com/ceph/ceph-nvmeof
INFO:nvmeof:NVMeoF gateway Git branch: tags/1.0.0
INFO:nvmeof:NVMeoF gateway Git commit: d08860d3a1db890b2c3ec9c8da631f1ded3b61b6
INFO:nvmeof:SPDK Git repository: https://github.com/ceph/spdk.git
INFO:nvmeof:SPDK Git branch: undefined
INFO:nvmeof:SPDK Git commit: 668268f74ea147f3343b9f8136df3e6fcc61f4cf
INFO:nvmeof:Starting ceph nvmeof discovery service
INFO:nvmeof:Connected to Ceph with version "18.2.1-629-g329eaff9 (329eaff91af228c8469648b22c12e1f4608e7b45) reef (stable)"
INFO:nvmeof:Requested huge pages count is 2048
There are cases where we deploy nvmeof service/ start GW though actual huge page count is less than requested huge pages to startup GW. Though GW might start successfully, it may error out during large scale due to memory crunch. So we need take a decision to warn or fail GW startup when huge page count is less
In below GW, nvmeof service is succesfully deployed when actual huge page count is 346 which is way less than requested i.e. 2048 but somehow GW started
In another GW on same setup, nvmeof service deployment failed when hug page count is 149 which is way less than requested and GW did not start