Xilinx-CNS / onload

OpenOnload high performance user-level network stack
Other
562 stars 90 forks source link

Fix fdtable allocation on new Debian #230

Open okt-sergeyn opened 3 months ago

okt-sergeyn commented 3 months ago

The first commit fixes #229 The second one fixes initialization of citp_log_level.

ivatet-amd commented 3 months ago

Thank you for filing a PR. 👍

I cannot schedule it at the moment, but we are keeping it on our radar.

Presumably, in the absence of a fix in Debian, a workaround for #229 is passing EF_FDTABLE_SIZE to Onload. This PR makes this workaround go away. Is my understanding correct?

What obstacles do you anticipate for the dynamic approach, where the FD tables increase in runtime on demand dynamically (say, unless EF_FDTABLE_SIZE is set, which could prevent the dynamic behaviour to avoid jitter)?

okt-sergeyn commented 3 months ago

Presumably, in the absence of a fix in Debian, a workaround for #229 is passing EF_FDTABLE_SIZE to Onload. This PR makes this workaround go away. Is my understanding correct?

Yes, absolutely right.

What obstacles do you anticipate for the dynamic approach, where the FD tables increase in runtime on demand dynamically (say, unless EF_FDTABLE_SIZE is set, which could prevent the dynamic behaviour to avoid jitter)?

It'd be a major modification I guess. Maybe @ol-alexandra has some thoughts.

ol-alexandra commented 3 months ago

What obstacles do you anticipate for the dynamic approach, where the FD tables increase in runtime on demand dynamically (say, unless EF_FDTABLE_SIZE is set, which could prevent the dynamic behaviour to avoid jitter)?

It'd be a major modification I guess. Maybe @ol-alexandra has some thoughts.

I agree that it would be a major modification with unpleasent experience for some users. For example some users run a multithreaded application with EF_FDS_MT_SAFE=1. If one thread re-allocates fdtable, and other thread assumes lockless access to fdtable, it would result in a surprise. You can say that such users must set EF_FDTABLE_SIZE, but it is not an easy task to pass this message to all the users.

You can implement "dynamic approach" which does not include realloc() call (and does not change the user address), but again, it is a major rework of fdtable.

New Debian will probably release next summer; you have some months to decide. At the same time, as this change comes from systemd, the same issue may arise with other distros (Fedora and so on).

okt-sergeyn commented 2 months ago

Hello! Friendly asking. Are there any updates on this issue?

sianj-xilinx commented 2 months ago

Are there any updates on this issue?

Hi Sergey. We have been thinking about this, but haven't yet come to a conclusion. The problem you point out definitely needs addressing, but I've got a few concerns about this. The main issue is that this will change the rlimit by default, which has the potential to silently break (unusual for onload) applications. The benefit of doing this way is obviously that things Just Work for most applications. The new EF_FDS_MAX is also very similar to the existing EF_FDTABLE_SIZE. We're considering what the best approach is, with alternatives including this or a more explicit configuration, for example bailing out with a helpful error message if we fail to allocate the fdtable, in which case the user can either configure the app or system limit, or use the existing EF_FDTABLE_SIZE. In the medium term we'd like to move to a dynamic approach, but in practice this will probably become a problem before we get that work done, so we'll need an interim solution.

okt-sergeyn commented 2 months ago

Hi Siân! I got your point. Thanks for detailed answer.