Repeatable sampling by max rows

Is your feature request related to a problem?

Generally, I know the max rows I would like to retrieve and need a sample of a given dataset for this rather than some percentage.

Describe the solution you'd like

Examples:

DuckDB allow specification of max rows Clickhouse has a fuzzy max rows "at least but not much more"

Describe alternatives you've considered

do something like

maxRows = 120000
rows = df.count_rows()
if maxRows < rows:
    sample_percent = maxRows/float(rows)
    df = df.sample(sample_percent).limit(maxRows)

Maybe this is inefficient as compared to one that returns given number of rows as part of sampling?

Additional Context

Ideally, this would be repeatable, i.e. allow one to set a seed. This would allow sampling 1 table with joins and then taking the rows from other tables as needed with the same sampling joined rows.

Eventual-Inc / Daft

Repeatable sampling by max rows #3332

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional Context

Would you like to implement a fix?