emabee / rust-serde_db

Apache License 2.0
9 stars 4 forks source link

The contracts of `DeserializableResultset` and `DeserializableRow` are ambiguous/underspecified #6

Open regexident opened 3 weeks ago

regexident commented 3 weeks ago

The contract of DeserializableResultset and its companion DeserializableRow feels somewhat under-specified in its current form.

Ambiguity: .fieldname(_)

Any use of a bare integer-based index without an explicitly specified ordering for the collection is ambiguous:

DeserializableResultSet:

/// Returns the name of the column at the specified index.
fn fieldname(&self, field_idx: usize) -> Option<&str>;

DeserializableRow:

/// Returns the name of the column at the specified index.
fn fieldname(&self, field_idx: usize) -> Option<&str>;

As an implementor this does not tell me how I should index my result set.

The docs are also conflating "column" and "field", which just adds to the confusion: columns usually imply an ordering, while field doesn't.

Ambiguity: .next()

Without an explicit queue order any notion of "next" is ambiguous:

DeserializableResultSet:

/// Removes the next row and returns it, or None if the result set is empty, or an error.
///
/// # Errors
///
/// E.g. fetching can fail.
fn next(&mut self) -> DeserializationResult<Option<Self::Row>>;

DeserializableRow:

/// Removes and returns the next value.
fn next(&mut self) -> Option<Self::Value>;

As an implementor this does not tell me if .next() should pop from the front or from the back of the result set (assuming it is ordered).


The name DeserializableResultSet implies set-semantics, which generally describes an unordered collection with a uniqueness constraint. Neither of which are common for database, despite "result set" being in common use for database query results.

A name with less unintentional implied semantics, such as DeserializableRows might thus be a safer choice as it avoids such ambiguities and implied semantics altogether, while also making the relation between DeserializableRows and DeserializableRow as clear as can be.

emabee commented 3 weeks ago

Thanks for this input - I'll take care within the next few days!

emabee commented 2 weeks ago

Once again, thanks for the provided changes!

Now to the discussion here:

Any use of a bare integer-based index without an explicitly specified ordering for the collection is ambiguous:

The ordering is usually specified by the SQL query, e.g. in select A, B, C from table_x the caller expects to get A first, B second, and C third.

The name DeserializableResultSet implies set-semantics

For result sets, the set-semantics implies that the order of the rows is by default undefined (in SQL, a well-defined ordering has to be explicitly requested by adding an order by clause to the query). It says nothing about the order within the rows.

regexident commented 2 weeks ago

For result sets, the set-semantics implies that the order of the rows is by default undefined (in SQL, a well-defined ordering has to be explicitly requested by adding an order by clause to the query).

It would be great for the above to be spelled out explicitly, in the trait's documentation.

Thanks for writing this, btw! I was about to work on this very kind of serde integration for cozo and was delighted to find that somebody —you— had already put in the work! Cozo's current API is rather unergonomic and boilerplate-y to use from Rust due to its need to manually map to/from its NamedRows (result set), Tuple (row) and DataValue (field) types. My hope is to fix this using serde_db.

emabee commented 2 weeks ago

I could extend the docu for the DeserializableResultSet like this:

/// Interface for a database result set to support deserialization.
///
/// A result set is expected to be a list of rows and define the structure of these rows.
/// Each row consists the same number and types of field values, according to the given structure.
///
/// The database server defines the order in which the rows appear in the list, and does this
/// in accordance to an optionally used ORDER BY clause in the query statement.
///
/// The values of the same field in the rows of a result set logically form a _column_, but we
/// avoid this term in this library because the column isn't expected to have a
/// direct physical representation.

Consequently, I would also purge the existing references to the term column.

emabee commented 2 weeks ago

My hope is to fix this using serde_db.

That would be great! Let me know if you see further things that would help in that endeavour!