leovalais commented 1 year ago

Enhance the `Model` macro

Current state

Currently, when applied to a struct, the Model macro can generate a bunch of implementations to factorize common DB operations:

access: trait Retrieve
deletion: trait Delete
creation: trait Create
update: trait Update

There is two approaches on how to use it:

Application on one struct (most common in editoast)

#[derive(Debug, Default, Identifiable, Insertable, Model, Queryable)]
#[model(table = "osrd_infra_document")]
#[model(create, delete, retrieve, update)]
#[diesel(table_name = osrd_infra_document)]
pub struct Document {
    #[diesel(deserialize_as = i64)]
    pub id: Option<i64>,
    #[diesel(deserialize_as = String)]
    pub content_type: Option<String>,
    #[diesel(deserialize_as = Vec<u8>)]
    pub data: Option<Vec<u8>>,
}

let mut d: Document = Document::retrieve(42, conn).await?.unwrap();
println("ct: {}", d.content_type.unwrap());
d.content_type = Some("x".to_owned());
d.update(conn).await?;
let d2 = Document { id: None, content_type: d.content_type, data: d.data };
d2.create(conn).await?; // clone doc 42
assert_eq!(d2.id, 43);

Pros:

only one struct per object to represent
straightforward
how Model was originally meant to be used

Cons:

every field is an Option<>
- otherwise columns with default values (such as id, creation_date, etc.) cannot be used properly
- actually optional columns must be represented either by Option<Option<>> or by blurring the distinction (when possible)
- each field must be annotated with deserialize_as
- unwrap()s everywhere...
- aaaaaaaaahhhhh........
one same struct represents two very different things:
- a valid document that represents something meaningful
- a set of (column, value) pairs that define a partial document by leaving some columns to their default values, which is just a low level Rust way to represent the values for an INSERT or an UPDATE
copy issues: default values have to be excluded instead of only including changes, etc.
ID issues
weakens high-level algorithms by making it unwrap() everywhere (bis), and let's not talk about the pain it is to review such code where you have to wonder whether each unwrap() is normal and never fails or not.

Application on two structs (for pathfinding and maybe train schedules)

https://github.com/DGEXSolutions/osrd/blob/c8a1770ca54769c13e51c60931707d364e1ec781/editoast/src/models/pathfinding.rs#L21-L100

Pros:

better separation between a valid pathfinding and an incomplete one for creation/insertion
the AsChangeSet approach is what Diesel promotes
Option<> fields only where they truly represent an optional value
less deserialize_as and unwrap()

Cons:

a lot of duplication
unbearably verbose
poorly maintainable
copy issues: default values have to be excluded instead of only including changes, etc.
PathfindingChangeset::create and PathfindingChangeset::update return a PathfindingChangeset instead of a Pathfinding
no one wants to write code like that

Proposal

I'd like to write something like that instead (:santa:):

#[derive(Debug, Default, Identifiable, Model, Queryable)]
#[model(table = "osrd_infra_document")]
#[model(create, delete, retrieve)]
#[diesel(table_name = osrd_infra_document)]
pub struct Document {
    pub id: i64,
    pub content_type: String,
    pub data: Vec<u8>,
}

let mut d: Document = Document::retrieve(42, conn).await?.unwrap();
println("ct: {}", d.content_type);
d.changeset()
 .content_type("x".to_owned())
 .update(conn)
 .await?;
assert_eq!(&d.content_type, "x"); // (mut d) should also be updated
let d2: Document = Document::changeset()
    .content_type(d.content_type)
    .data(d.data)
    .create(conn) // clone doc 42
    .await?;
assert_eq!(d2.id, 43);

While the actual API here is just a proposal, there are, I believe, a few essential points that an enhanced Macro model should at least cover:

no more unnecessary Option and cluttering
create and update are patches of some columns instead of a full row replacement that are easy to write/spot/type/debug
the model struct declaration should be as tight as possible, without duplication

`Model`ling infra objects

In the case of infra objects, only a few models are implemented right now (atm only catenaries, OPs and tracks) but more will come :mage: and the implementation can be made more satisfactory ⚙️.

I.e.: for track sections, atm:

https://github.com/DGEXSolutions/osrd/blob/c8a1770ca54769c13e51c60931707d364e1ec781/editoast/src/models/infra_objects/track_section.rs#L9-L20

There are several problems:

the same struct (id, infra_id, obj_id, data) is duplicated for each model
infra objects are usually identified by the couple (infra_id, obj_id) rather that by row ID, but the traits Retrieve, etc. do not support that identification
retrieve, create, etc. all work using a infra_model::TrackSectionModel (the boilerplate) and not using a schema::TrackSection (which is the struct we'd like to use)

Proposal

Get rid of models::infra_objects altogether and focus on what's defined in schema.
Reuse the InfraModel derive macro so that it also:
1. Generates the struct (id, infra_id, obj_id, data) that derives(Model) and references the schema
2. Provides an implementation of Retrieve, Create and Delete (not Update, see below) directly for the schema, but using (infra_id, obj_id) instead of the row id directly.
3. These implementation are responsible for making the equivalence row id <=> (infra_id, obj_id) and to forward the call to the actual Model

As far as I can tell, infra objects are only updated in schema::operation (osrd editor requests) that updates the DB directly. Hence, for now I suppose there is no need to design a way to update infra models in a Rust fashion. But should we need to, the separation InfraModel(schema) <-> Model(DB row) could allow us to design an elaborate schema changeset builder where chained method calls would map into a set of (JSON path, update value), who knows?

Likewise, I've pinned two ways infra objects are created. Using the persist_batch function (current behaviour of InfraModel) during infra import and in unit tests. persist_batch directly queries the DB, I think it should stay that way (I can elaborate if that's necessary :smile:). In unit tests, it's convenient to have a create() method, but there is no need to handle the case of autogenerated column values for infra objects (we only need to provide the whole schema). So here as well we don't need to bother with changesets and elaborate builders.

// The declaration is left unchanged (InfraModel already exists)
#[derive(Debug, Derivative, Clone, Deserialize, Serialize, PartialEq, InfraModel)]
#[serde(deny_unknown_fields)]
#[infra_model(table = "crate::tables::osrd_infra_tracksectionmodel")]
#[derivative(Default)]
pub struct TrackSection {
    pub id: Identifier,
    #[derivative(Default(value = "100."))]
    pub length: f64,
    pub slopes: Vec<Slope>,
    pub curves: Vec<Curve>,
    #[serde(default)]
    pub loading_gauge_limits: Vec<LoadingGaugeLimit>,
    #[derivative(Default(value = "Geometry::new(LineString(vec![]))"))]
    pub geo: Geometry,
    #[derivative(Default(value = "Geometry::new(LineString(vec![]))"))]
    pub sch: Geometry,
    #[serde(default)]
    pub extensions: TrackSectionExtensions,
}

// API proposal:
// The TrackSectionModel (mapping of the SQL table) management becomes transparent here
let mut ts = TrackSection::retrieve((3, "TD0"), conn).await?.unwrap();
assert_eq!(ts.length, 25000.); // lol
ts.length *= 2;
let ts2 = ts.create(conn).await?; // No builder to design (hopefully)
assert_eq!(ts2.length, 50000.); // lol^2
ts.delete(conn).await?;

More advanced operations

Copy

A copy function that generates a changeset with pre-filled values may prove useful. Any thoughts?

let d: Document = ...;
d.duplicate()
 .content_type("json")
 .create(conn)
 .await?;

Batch operations

#[derive(Debug, Default, Identifiable, Model, Queryable)]
#[model(create, delete, retrieve, batch_retrieve, batch_delete, batch_update)] // <<<
pub struct Document { ... }

let docs: Vec<Document> = Document::batch_retrieve(conn, [1, 2, 3, 4].iter()).await?; // docs.len() may != 4
let _: HashMap<i64, Document> = Document::batch_retrieve_map(conn, [1, 2, 3, 4].iter()).await?;

// checks that all 4 documents could be retrieved or returns an error with the list
// of ids that do not exist
let docs: Vec<Document> = Document::batch_retrieve_all(conn, [1, 2, 3, 4].iter()).await?;
assert_eq!(docs.len(), 4);
let hdocs: HashMap<i64, Document> = Document::batch_retrieve_all_map(conn, [1, 2, 3, 4].iter()).await?;
assert_eq!(hdocs.len(), 4);

let _: Vec<Document> = Document::delete_batch(conn, docs.iter().map(|d| d.id)).await?;
let _: HashMap<i64, Document> = Document::delete_batch_map(conn, docs.iter().map(|d| d.id)).await?;
// no _all variant for delete

// locks infras 1, 2, 3
let infras: Vec<Infra> = Infra::changeset()
    .locked(true)
    .update_batch(conn, [1, 2, 3].iter())
    .await?;
// likewise for update_batch_map, update_batch_all, update_batch_all_map

Before starting to make a list of all traits + fns prototypes that the macros described here should generate, I'd like to hear your thoughts on both the general ideas and the API proposal above.

@flomonster

flomonster commented 1 year ago

Proposal (Infra objects)

Make the primary key a parameter of the Model trait. When using macros the i64 will be used.
- Infra objects can implement the same trait templated with the tuple (infra_id, obj_id).
We need a function to convert a schema into a model struct given an obj_id and an infra_id.
We could have another macro (InfraModel could be reused) to generate this code.

let track_schema: TrackSection = ....;
let infra_id = 42;
TrackSectionAsChangeSet::from_schema(track_schema, "my_track_id", infra_id).create(conn).await?;
TrackSectionAsChangeSet::from_schema(track_schema, "my_track_id", infra_id).update(conn).await?;
TrackSectionModel::delete(("my_track_id", infra_id)).await?; 
TrackSectionModel::retrieve(("my_track_id", infra_id)).await?;

Idea

implement Deref for model objects to easily retrieve the schema.

leovalais commented 1 year ago

Move persist_batch out of schema objects so that these structs do not embed any DB related behaviors

flomonster commented 1 year ago

Migration

Create a new Model macro (to avoid changing everything at once).
Move struct using LegacyModel to NewModel one by one.
Drop LegacyModel

leovalais commented 1 year ago

[x] #6033
[x] #4805
[x] #4807
[x] #5055
[ ] #6141
[x] #6486
[ ] #7031
[ ] #7033
[ ] #7034

leovalais commented 1 year ago

In order not to forget, we could also add an attribute macro #[model(jsonb = true)] in order to wrap/unwrap diesel_json::Json<> automatically. That would make utopia's #[schema(value_type = T)] for each DieselJson<T> disappear.

#[derive(Model)]
struct Foo {
  #[model(jsonb = true)]
  bar: Baz
}

// roughly equivalent to

struct Foo {
  bar: diesel_json::Json<Baz>
}

@laggron42 wdyt?

laggron42 commented 1 year ago

Definitely would be useful indeed, if it's an option!

leovalais commented 1 year ago

Add a setting #[model(additional_derives(row(...), changeset(...)))]

leovalais commented 1 year ago

Scratch that. That's just additional configuration for rows and changsets, this is probably better:

#[model(row(type_name = "LOL", derive(...)))]
#[model(changeset(type_name = "LOLZ", derive(...)))]

leovalais commented 1 year ago

TODO! Also auto derive an Exists<Pk>: Retrieve<Pk> with a blanket impl Model::exists(pk).await? <=> Model::retrieve(pk).await?.is_some()

leovalais commented 11 months ago

Associations

Currently Models are (almost) a 1:1 mapping to their respective table schema. Therefore foreign references are i64s. This is problematic for several reasons:

we don't abstract away from the table representation
querying everything manually is tedious and error-prone
it can lead to race conditions
transactions are often forgotten

There are three common kinds of associations (cf. Django's documentation of comprehensive examples):

One to one (rolling stock <-> livery, infra object <-> search table, ...)
One to many (scenario *-> study, study *-> project, livery *-> document, ...). One side of this association is provided by diesel by the #[diesel(belongs_to(Parent))] annotation. But that forces us to use diesel's dsl, loading manually a second request, it's impossible to use belonging_to with something different than the id, etc...
Many to many (I believe we don't have any?)

Since the models only model the ends of the relations, we only have to focus on implementing what "One" and "Many" means, on both sides of the association.

Additionally, depending on the context we might want to resolve these references either eagerly or lazily.

Example with `Study`

#[derive(Clone, Debug, Serialize, Deserialize, Derivative, Model, ToSchema)]
#[derivative(Default)]
#[model(table = "study")]
pub struct Study {
    pub id: i64,
    #[derivative(Default(value = "Some(String::new())"))]
    pub name: String,
    #[derivative(Default(value = "Some(String::new())"))]
    pub description: String,
    #[derivative(Default(value = "Some(String::new())"))]
    pub business_code: String,
    #[derivative(Default(value = "Some(String::new())"))]
    pub service_code: String,
    pub creation_date: NaiveDateTime,
    #[derivative(Default(value = "Utc::now().naive_utc()"))]
    pub last_modification: NaiveDateTime,
    pub start_date: Option<NaiveDate>,
    pub expected_end_date: Option<NaiveDate>,
    pub actual_end_date: Option<NaiveDate>,
    #[derivative(Default(value = "Some(0)"))]
    pub budget: i32,
    #[derivative(Default(value = "Some(Vec::new())"))]
    pub tags: Vec<String>,
    #[derivative(Default(value = "Some(String::new())"))]
    pub state: String,
    #[derivative(Default(value = "Some(String::new())"))]
    pub study_type: String,

    // allows race conditions, true by default, otherwise LazyRef locks the row of self in the transaction, if it still exists
    #[model(lazy_ref, lock_ref = false)] // , key = project_id)]
    pub project: LazyRef<Project>,

    // since it's not declared lazy, they will ALL be loaded within the same transaction as the retrieved study
    #[model(foreign_ref_many, inverse = Scenario::study_id)] // or inverse = (Obj::infra_id, Obj::obj_id)
    pub scenarios: Vec<Scenario>, 

    pub project_id: i64, // we can still access the project id directly
}

let s = Study::retrieve(42, conn).await?;
let p: &Project = s.project.get(conn).await?; // the project is loaded once at usage
assert_eq!(p.id, s.project_id);
println!("{}", s.scenarios.first().unwrap().name); // since scenarios are loaded eagerly, they're directly available

Lazy refs could be wrapped in a LazyRef<> type which would be an enhanced OnceLock<> that performs additional operations before querying, such as locking rows. For the eager reference, I couldn't find a reason why we would need a wrapper.

For relations with "many" objects, I'd like to leave it to .collect() to infer the type of the collection, and to allow maps using an attribute include_key. (We can make that an issue on its own afterwards.)

Retrieval

New transaction
Retrieve the model, locking its row (https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-ROWS)
Recursively retrieve the eager references
Create a LazyRef, embedding the ids to fetch and the data to check, for each lazy ref
Commit and load

Update / Create

New transaction
Lock the row (update only)
Update the row and load the update (RETURNING *) / Create the row
Recursively retrieve the eager references
Create a LazyRef, embedding the ids to fetch and the data to check, for each lazy ref
Commit and load

Delete

No changes, cascading deletions should be handled by the database as the Model system is not designed to maintain the bijection.

Loading a `LazyRef<>`

If the model has already been loaded, return it
Otherwise, new transaction
Unless configured by lock_ref = false, check that the parent model still exists, locking it
Retrieve the reference content
Commit, load and cache

"Many" references

Many references are roughly a batch_retrieve invocation.

Inverse associations

Inverse associations are trickier to parse, but since the inverse annotation is supposed to be an Identfier of the referred Model, we should still be able to call {,batch_}retrieve as usual.

Changesets / Patches

Obviously, inverse associations should be excluded from changesets and patches. But for direct associations, we have two options:

For each direct ref, we add the reference name as a changeset method, but its type is the foreign's key type, and the change set field is the identifier (e.g.: Study::Changeset has a fn project(project_id: i64) method and a project_id: i64 field). It's definitely realizable, but it involves a little bit of logic, especially with compound identifiers.
Or we can just say that we skip all associations, and that the model has to provide all its fields explicitly (always keep project_id in Study). It's ugly, but easier.

I prefer the first option.

Save

Recursive by default? Opt-out option?

API

#[model(foreign_ref, key, inverse)]
#[model(lazy_ref, lock_ref = true, key, inverse)]
#[model(foreign_ref_many, include_key = false, inverse)]
#[model(lazy_ref_many, include_key = false, lock_ref = true, inverse)]

// All annotations accept a key and/or inverse attribute
key = <identifier>
inverse = <qualified identifier>
// where
<identifier> = <field> | (<field>,*)
<qualified identifier> = <model>::<field> | (<Model>::<field>,*)

leovalais commented 11 months ago

Add a configuration option for both row and changeset to include additional diesel derive configuration.

leovalais commented 9 months ago

API evolutions

[x] For each field of a changeset / patch, generate fn flat_field(self, Option<T>)
[x] Derive an Exists<K> trait (maybe via a Count<K> trait)
[x] Add types Changeset<T> = <T as Model>::Changeset and Row<T> = <T as Model>::Row to avoid the "ambiguous type" error (rust compiler being too strict)

leovalais commented 9 months ago

Listing

API proposal to integrate ORDER BY queries into ModelV2.

#[derive(Debug, Default, Clone, ModelV2)]
#[model(table = crate::tables::document)]
pub struct Document {
    pub id: i64,
    pub content_type: String,
    pub data: Vec<u8>,
}

let docs/*: Vec<Document> */ = Document::list(
    &mut conn,
    ListSettings::new() // <query> WHERE content_type = "plain/text" ORDER BY content_type DESC, id ASC LIMIT 25 OFFSET 50
        .filter(Fields<Document>::ContentType.eq("plain/text")) // Box<dyn BoxableExpression<table, Pg, Bool>>
        .order_by(Fields<Document>::ContentType.desc()) // Box<dyn BoxableExpression<table, pg, SqlType = NotSelectable>>
        .order_by(Fields<Document>::Id.asc())
        .limit(25)
        .offset(50)
).await?;

let ops: HashMap<(i64, String), OperationalPointModel> = OPM::list_with_key(...).await?;

That would almost replace modelv1::List manual implementations. Breaking changes would include the name sorting criteria for projects which translates to lower(p.name) in SQL. So by using the new API above we would slightly change the ordering. I'd say it doesn't really matter, as we can always (as another issue, iff deemed necessary):

implement List manually
add more ordering criteria for string columns

Best to do that before splitting models in its own crate, that way we don't need to keep PaginatedResponse<> into it (as it should belong in the views crate).

leovalais commented 9 months ago

Checking for missing defaults

Prevents the application of changesets without values for non-default columns in Create.

#[derive(Debug, Default, Clone, ModelV2)]
#[model(table = crate::tables::document)]
pub struct Document {
    pub id: i64,
    #[model(required)]
    pub content_type: String,
    pub data: Vec<u8>,
}

Document::changeset().data(vec![]).create(...).await.is_err() // true

Not necessarily useful since diesel will complain anyway in the create(). Up to discussion.

leovalais commented 8 months ago

We need to be able to forward derive annotations to the changeset (and maybe the row) at both struct- and field-level. We hit this issue while trying to implement Validator for Changeset<RollingStockModel> with a custom validator function.

[!NOTE] Whether we choose to keep Validator or not is another problem. We might also encounter this problem with Derivative for instance.

Implementation

One can take inspiration from how strum_macros forwards annotations to the enum generated by EnumDiscriminants (https://docs.rs/strum_macros/latest/strum_macros/derive.EnumDiscriminants.html).

leovalais commented 7 months ago

Model locking

let mut doc = Document::retrieve(conn, 42).await?;

// Locking for update (other queries can only read the pk) FOR NO KEY UPDATE
doc.exclusive_lock(conn, |conn, doc: &mut Document| async { // lock_for_update: &mut Self, &mut Self -> Result<()>
    doc.data = vec![1, 2, 3];
    doc.save().await?;
}).await?;

/// ---

let infra = Infra::retrieve(conn, 42).await?;

// Shared lock to prevent modification (other queries can read the whole row, but cannot update it) FOR SHARE
infra.lock(conn, |conn, infra: &Infra| async { // lock: &mut Self, &Self -> Result<()>
    infra.import_railjson(...).await?;
}).await?;

Each lock opens a new transaction (required by for the locking system to work), locks the row and runs the closure. We don't need FOR UPDATE because models PKs are not supposed to be changed. I don't think we need to consider FOR KEY SHARE but I may be wrong.

More info on locking here: https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-ROWS

leovalais commented 7 months ago

Dependencies and order of implementation

List V2 required to split models in a crate
Model locking required to implement associations
Better implement annotation forwarding asap before we hit another problem without an obvious workaround

No need to wait for locking to migrate the Infra model (there are currently locking queries for infras, but they're not run in a transaction, so I'm not sure they work at all).

OpenRailAssociation / osrd

[META] editoast: enhance the `Model` macro #4184

Enhance the `Model` macro

Current state

Application on one struct (most common in editoast)

Application on two structs (for pathfinding and maybe train schedules)

Proposal

`Model`ling infra objects

Proposal

More advanced operations

Copy

Batch operations

Proposal (Infra objects)

Idea

Migration

Associations

Example with `Study`

Retrieval

Update / Create

Delete

Loading a `LazyRef<>`

"Many" references

Inverse associations

Changesets / Patches

Save

API

API evolutions

Listing

Checking for missing defaults

Implementation

Model locking

Dependencies and order of implementation

OpenRailAssociation / osrd

[META] editoast: enhance the `Model` macro #4184

Enhance the Model macro

Current state

Application on one struct (most common in editoast)

Application on two structs (for pathfinding and maybe train schedules)

Proposal

Modelling infra objects

Proposal

More advanced operations

Copy

Batch operations

Proposal (Infra objects)

Idea

Migration

Associations

Example with Study

Retrieval

Update / Create

Delete

Loading a LazyRef<>

"Many" references

Inverse associations

Changesets / Patches

Save

API

API evolutions

Listing

Checking for missing defaults

Implementation

Model locking

Dependencies and order of implementation

Enhance the `Model` macro

`Model`ling infra objects

Example with `Study`

Loading a `LazyRef<>`