OpenRailAssociation / osrd

An open source web application for railway infrastructure design, capacity analysis, timetabling and simulation
https://osrd.fr
458 stars 43 forks source link

[META] editoast: enhance the `Model` macro #4184

Closed leovalais closed 1 month ago

leovalais commented 1 year ago

Enhance the Model macro

Current state

Currently, when applied to a struct, the Model macro can generate a bunch of implementations to factorize common DB operations:

There is two approaches on how to use it:

Application on one struct (most common in editoast)

#[derive(Debug, Default, Identifiable, Insertable, Model, Queryable)]
#[model(table = "osrd_infra_document")]
#[model(create, delete, retrieve, update)]
#[diesel(table_name = osrd_infra_document)]
pub struct Document {
    #[diesel(deserialize_as = i64)]
    pub id: Option<i64>,
    #[diesel(deserialize_as = String)]
    pub content_type: Option<String>,
    #[diesel(deserialize_as = Vec<u8>)]
    pub data: Option<Vec<u8>>,
}

let mut d: Document = Document::retrieve(42, conn).await?.unwrap();
println("ct: {}", d.content_type.unwrap());
d.content_type = Some("x".to_owned());
d.update(conn).await?;
let d2 = Document { id: None, content_type: d.content_type, data: d.data };
d2.create(conn).await?; // clone doc 42
assert_eq!(d2.id, 43);

Pros:

Cons:

Application on two structs (for pathfinding and maybe train schedules)

https://github.com/DGEXSolutions/osrd/blob/c8a1770ca54769c13e51c60931707d364e1ec781/editoast/src/models/pathfinding.rs#L21-L100

Pros:

Cons:

Proposal

I'd like to write something like that instead (:santa:):

#[derive(Debug, Default, Identifiable, Model, Queryable)]
#[model(table = "osrd_infra_document")]
#[model(create, delete, retrieve)]
#[diesel(table_name = osrd_infra_document)]
pub struct Document {
    pub id: i64,
    pub content_type: String,
    pub data: Vec<u8>,
}

let mut d: Document = Document::retrieve(42, conn).await?.unwrap();
println("ct: {}", d.content_type);
d.changeset()
 .content_type("x".to_owned())
 .update(conn)
 .await?;
assert_eq!(&d.content_type, "x"); // (mut d) should also be updated
let d2: Document = Document::changeset()
    .content_type(d.content_type)
    .data(d.data)
    .create(conn) // clone doc 42
    .await?;
assert_eq!(d2.id, 43);

While the actual API here is just a proposal, there are, I believe, a few essential points that an enhanced Macro model should at least cover:

Modelling infra objects

In the case of infra objects, only a few models are implemented right now (atm only catenaries, OPs and tracks) but more will come :mage: and the implementation can be made more satisfactory ⚙️.

I.e.: for track sections, atm:

https://github.com/DGEXSolutions/osrd/blob/c8a1770ca54769c13e51c60931707d364e1ec781/editoast/src/models/infra_objects/track_section.rs#L9-L20

There are several problems:

Proposal

As far as I can tell, infra objects are only updated in schema::operation (osrd editor requests) that updates the DB directly. Hence, for now I suppose there is no need to design a way to update infra models in a Rust fashion. But should we need to, the separation InfraModel(schema) <-> Model(DB row) could allow us to design an elaborate schema changeset builder where chained method calls would map into a set of (JSON path, update value), who knows?

Likewise, I've pinned two ways infra objects are created. Using the persist_batch function (current behaviour of InfraModel) during infra import and in unit tests. persist_batch directly queries the DB, I think it should stay that way (I can elaborate if that's necessary :smile:). In unit tests, it's convenient to have a create() method, but there is no need to handle the case of autogenerated column values for infra objects (we only need to provide the whole schema). So here as well we don't need to bother with changesets and elaborate builders.

// The declaration is left unchanged (InfraModel already exists)
#[derive(Debug, Derivative, Clone, Deserialize, Serialize, PartialEq, InfraModel)]
#[serde(deny_unknown_fields)]
#[infra_model(table = "crate::tables::osrd_infra_tracksectionmodel")]
#[derivative(Default)]
pub struct TrackSection {
    pub id: Identifier,
    #[derivative(Default(value = "100."))]
    pub length: f64,
    pub slopes: Vec<Slope>,
    pub curves: Vec<Curve>,
    #[serde(default)]
    pub loading_gauge_limits: Vec<LoadingGaugeLimit>,
    #[derivative(Default(value = "Geometry::new(LineString(vec![]))"))]
    pub geo: Geometry,
    #[derivative(Default(value = "Geometry::new(LineString(vec![]))"))]
    pub sch: Geometry,
    #[serde(default)]
    pub extensions: TrackSectionExtensions,
}

// API proposal:
// The TrackSectionModel (mapping of the SQL table) management becomes transparent here
let mut ts = TrackSection::retrieve((3, "TD0"), conn).await?.unwrap();
assert_eq!(ts.length, 25000.); // lol
ts.length *= 2;
let ts2 = ts.create(conn).await?; // No builder to design (hopefully)
assert_eq!(ts2.length, 50000.); // lol^2
ts.delete(conn).await?;

More advanced operations

Copy

A copy function that generates a changeset with pre-filled values may prove useful. Any thoughts?

let d: Document = ...;
d.duplicate()
 .content_type("json")
 .create(conn)
 .await?;

Batch operations

#[derive(Debug, Default, Identifiable, Model, Queryable)]
#[model(create, delete, retrieve, batch_retrieve, batch_delete, batch_update)] // <<<
pub struct Document { ... }

let docs: Vec<Document> = Document::batch_retrieve(conn, [1, 2, 3, 4].iter()).await?; // docs.len() may != 4
let _: HashMap<i64, Document> = Document::batch_retrieve_map(conn, [1, 2, 3, 4].iter()).await?;

// checks that all 4 documents could be retrieved or returns an error with the list
// of ids that do not exist
let docs: Vec<Document> = Document::batch_retrieve_all(conn, [1, 2, 3, 4].iter()).await?;
assert_eq!(docs.len(), 4);
let hdocs: HashMap<i64, Document> = Document::batch_retrieve_all_map(conn, [1, 2, 3, 4].iter()).await?;
assert_eq!(hdocs.len(), 4);

let _: Vec<Document> = Document::delete_batch(conn, docs.iter().map(|d| d.id)).await?;
let _: HashMap<i64, Document> = Document::delete_batch_map(conn, docs.iter().map(|d| d.id)).await?;
// no _all variant for delete

// locks infras 1, 2, 3
let infras: Vec<Infra> = Infra::changeset()
    .locked(true)
    .update_batch(conn, [1, 2, 3].iter())
    .await?;
// likewise for update_batch_map, update_batch_all, update_batch_all_map

Before starting to make a list of all traits + fns prototypes that the macros described here should generate, I'd like to hear your thoughts on both the general ideas and the API proposal above.

@flomonster

flomonster commented 1 year ago

Proposal (Infra objects)

let track_schema: TrackSection = ....;
let infra_id = 42;
TrackSectionAsChangeSet::from_schema(track_schema, "my_track_id", infra_id).create(conn).await?;
TrackSectionAsChangeSet::from_schema(track_schema, "my_track_id", infra_id).update(conn).await?;
TrackSectionModel::delete(("my_track_id", infra_id)).await?; 
TrackSectionModel::retrieve(("my_track_id", infra_id)).await?; 

Idea

leovalais commented 1 year ago

Move persist_batch out of schema objects so that these structs do not embed any DB related behaviors

flomonster commented 1 year ago

Migration

leovalais commented 1 year ago
leovalais commented 1 year ago

In order not to forget, we could also add an attribute macro #[model(jsonb = true)] in order to wrap/unwrap diesel_json::Json<> automatically. That would make utopia's #[schema(value_type = T)] for each DieselJson<T> disappear.

#[derive(Model)]
struct Foo {
  #[model(jsonb = true)]
  bar: Baz
}

// roughly equivalent to

struct Foo {
  bar: diesel_json::Json<Baz>
}

@laggron42 wdyt?

laggron42 commented 1 year ago

Definitely would be useful indeed, if it's an option!

leovalais commented 1 year ago

Add a setting #[model(additional_derives(row(...), changeset(...)))]

leovalais commented 1 year ago

Scratch that. That's just additional configuration for rows and changsets, this is probably better:

#[model(row(type_name = "LOL", derive(...)))]
#[model(changeset(type_name = "LOLZ", derive(...)))]
leovalais commented 1 year ago

TODO! Also auto derive an Exists<Pk>: Retrieve<Pk> with a blanket impl Model::exists(pk).await? <=> Model::retrieve(pk).await?.is_some()

leovalais commented 11 months ago

Associations

Currently Models are (almost) a 1:1 mapping to their respective table schema. Therefore foreign references are i64s. This is problematic for several reasons:

There are three common kinds of associations (cf. Django's documentation of comprehensive examples):

  1. One to one (rolling stock <-> livery, infra object <-> search table, ...)
  2. One to many (scenario *-> study, study *-> project, livery *-> document, ...). One side of this association is provided by diesel by the #[diesel(belongs_to(Parent))] annotation. But that forces us to use diesel's dsl, loading manually a second request, it's impossible to use belonging_to with something different than the id, etc...
  3. Many to many (I believe we don't have any?)

Since the models only model the ends of the relations, we only have to focus on implementing what "One" and "Many" means, on both sides of the association.

Additionally, depending on the context we might want to resolve these references either eagerly or lazily.

Example with Study

#[derive(Clone, Debug, Serialize, Deserialize, Derivative, Model, ToSchema)]
#[derivative(Default)]
#[model(table = "study")]
pub struct Study {
    pub id: i64,
    #[derivative(Default(value = "Some(String::new())"))]
    pub name: String,
    #[derivative(Default(value = "Some(String::new())"))]
    pub description: String,
    #[derivative(Default(value = "Some(String::new())"))]
    pub business_code: String,
    #[derivative(Default(value = "Some(String::new())"))]
    pub service_code: String,
    pub creation_date: NaiveDateTime,
    #[derivative(Default(value = "Utc::now().naive_utc()"))]
    pub last_modification: NaiveDateTime,
    pub start_date: Option<NaiveDate>,
    pub expected_end_date: Option<NaiveDate>,
    pub actual_end_date: Option<NaiveDate>,
    #[derivative(Default(value = "Some(0)"))]
    pub budget: i32,
    #[derivative(Default(value = "Some(Vec::new())"))]
    pub tags: Vec<String>,
    #[derivative(Default(value = "Some(String::new())"))]
    pub state: String,
    #[derivative(Default(value = "Some(String::new())"))]
    pub study_type: String,

    // allows race conditions, true by default, otherwise LazyRef locks the row of self in the transaction, if it still exists
    #[model(lazy_ref, lock_ref = false)] // , key = project_id)]
    pub project: LazyRef<Project>,

    // since it's not declared lazy, they will ALL be loaded within the same transaction as the retrieved study
    #[model(foreign_ref_many, inverse = Scenario::study_id)] // or inverse = (Obj::infra_id, Obj::obj_id)
    pub scenarios: Vec<Scenario>, 

    pub project_id: i64, // we can still access the project id directly
}

let s = Study::retrieve(42, conn).await?;
let p: &Project = s.project.get(conn).await?; // the project is loaded once at usage
assert_eq!(p.id, s.project_id);
println!("{}", s.scenarios.first().unwrap().name); // since scenarios are loaded eagerly, they're directly available

Lazy refs could be wrapped in a LazyRef<> type which would be an enhanced OnceLock<> that performs additional operations before querying, such as locking rows. For the eager reference, I couldn't find a reason why we would need a wrapper.

For relations with "many" objects, I'd like to leave it to .collect() to infer the type of the collection, and to allow maps using an attribute include_key. (We can make that an issue on its own afterwards.)

Retrieval

  1. New transaction
  2. Retrieve the model, locking its row (https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-ROWS)
  3. Recursively retrieve the eager references
  4. Create a LazyRef, embedding the ids to fetch and the data to check, for each lazy ref
  5. Commit and load

Update / Create

  1. New transaction
  2. Lock the row (update only)
  3. Update the row and load the update (RETURNING *) / Create the row
  4. Recursively retrieve the eager references
  5. Create a LazyRef, embedding the ids to fetch and the data to check, for each lazy ref
  6. Commit and load

Delete

No changes, cascading deletions should be handled by the database as the Model system is not designed to maintain the bijection.

Loading a LazyRef<>

  1. If the model has already been loaded, return it
  2. Otherwise, new transaction
  3. Unless configured by lock_ref = false, check that the parent model still exists, locking it
  4. Retrieve the reference content
  5. Commit, load and cache

"Many" references

Many references are roughly a batch_retrieve invocation.

Inverse associations

Inverse associations are trickier to parse, but since the inverse annotation is supposed to be an Identfier of the referred Model, we should still be able to call {,batch_}retrieve as usual.

Changesets / Patches

Obviously, inverse associations should be excluded from changesets and patches. But for direct associations, we have two options:

I prefer the first option.

Save

Recursive by default? Opt-out option?

API

#[model(foreign_ref, key, inverse)]
#[model(lazy_ref, lock_ref = true, key, inverse)]
#[model(foreign_ref_many, include_key = false, inverse)]
#[model(lazy_ref_many, include_key = false, lock_ref = true, inverse)]

// All annotations accept a key and/or inverse attribute
key = <identifier>
inverse = <qualified identifier>
// where
<identifier> = <field> | (<field>,*)
<qualified identifier> = <model>::<field> | (<Model>::<field>,*)
leovalais commented 11 months ago

Add a configuration option for both row and changeset to include additional diesel derive configuration.

leovalais commented 9 months ago

API evolutions

leovalais commented 9 months ago

Listing

API proposal to integrate ORDER BY queries into ModelV2.

#[derive(Debug, Default, Clone, ModelV2)]
#[model(table = crate::tables::document)]
pub struct Document {
    pub id: i64,
    pub content_type: String,
    pub data: Vec<u8>,
}

let docs/*: Vec<Document> */ = Document::list(
    &mut conn,
    ListSettings::new() // <query> WHERE content_type = "plain/text" ORDER BY content_type DESC, id ASC LIMIT 25 OFFSET 50
        .filter(Fields<Document>::ContentType.eq("plain/text")) // Box<dyn BoxableExpression<table, Pg, Bool>>
        .order_by(Fields<Document>::ContentType.desc()) // Box<dyn BoxableExpression<table, pg, SqlType = NotSelectable>>
        .order_by(Fields<Document>::Id.asc())
        .limit(25)
        .offset(50)
).await?;

let ops: HashMap<(i64, String), OperationalPointModel> = OPM::list_with_key(...).await?;

That would almost replace modelv1::List manual implementations. Breaking changes would include the name sorting criteria for projects which translates to lower(p.name) in SQL. So by using the new API above we would slightly change the ordering. I'd say it doesn't really matter, as we can always (as another issue, iff deemed necessary):

Best to do that before splitting models in its own crate, that way we don't need to keep PaginatedResponse<> into it (as it should belong in the views crate).

leovalais commented 9 months ago

Checking for missing defaults

Prevents the application of changesets without values for non-default columns in Create.

#[derive(Debug, Default, Clone, ModelV2)]
#[model(table = crate::tables::document)]
pub struct Document {
    pub id: i64,
    #[model(required)]
    pub content_type: String,
    pub data: Vec<u8>,
}

Document::changeset().data(vec![]).create(...).await.is_err() // true

Not necessarily useful since diesel will complain anyway in the create(). Up to discussion.

leovalais commented 8 months ago

We need to be able to forward derive annotations to the changeset (and maybe the row) at both struct- and field-level. We hit this issue while trying to implement Validator for Changeset<RollingStockModel> with a custom validator function.

[!NOTE] Whether we choose to keep Validator or not is another problem. We might also encounter this problem with Derivative for instance.

Implementation

One can take inspiration from how strum_macros forwards annotations to the enum generated by EnumDiscriminants (https://docs.rs/strum_macros/latest/strum_macros/derive.EnumDiscriminants.html).

leovalais commented 7 months ago

Model locking

let mut doc = Document::retrieve(conn, 42).await?;

// Locking for update (other queries can only read the pk) FOR NO KEY UPDATE
doc.exclusive_lock(conn, |conn, doc: &mut Document| async { // lock_for_update: &mut Self, &mut Self -> Result<()>
    doc.data = vec![1, 2, 3];
    doc.save().await?;
}).await?;

/// ---

let infra = Infra::retrieve(conn, 42).await?;

// Shared lock to prevent modification (other queries can read the whole row, but cannot update it) FOR SHARE
infra.lock(conn, |conn, infra: &Infra| async { // lock: &mut Self, &Self -> Result<()>
    infra.import_railjson(...).await?;
}).await?;

Each lock opens a new transaction (required by for the locking system to work), locks the row and runs the closure. We don't need FOR UPDATE because models PKs are not supposed to be changed. I don't think we need to consider FOR KEY SHARE but I may be wrong.

More info on locking here: https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-ROWS

leovalais commented 7 months ago

Dependencies and order of implementation

No need to wait for locking to migrate the Infra model (there are currently locking queries for infras, but they're not run in a transaction, so I'm not sure they work at all).