Bring the service duplication operation server side

tahini commented 1 month ago

First and easiest step towards #297, but it prepares the approach:

Add argument to database collection function to fetch partial collection of services
Add database queries to fetch all services names starting with a certain prefix, to generate unique names
Add operations to duplicate single or multiple services, with separate function for saving, which allow longer running duplicate operations (like agency or line duplication), to postpone saving when all objects are ready to reduce the transaction time.
Add socket route to duplicate services
Let the service duplication button use this new socket route to duplicate the service. Other callers, like line duplicator are not ready yet to call the new socket route. They will themselves be moved to backend later.

greenscientist commented 1 month ago

Add operations to duplicate single or multiple services, with separate function for saving, which allow longer running duplicate operations (like agency or line duplication), to postpone saving when all objects are ready to reduce the transaction time.

I haven't read the code yet, but with a duplication operation, the reading and the writing should probably be part of the same transaction to avoid inconsistency.

tahini commented 1 month ago

Add operations to duplicate single or multiple services, with separate function for saving, which allow longer running duplicate operations (like agency or line duplication), to postpone saving when all objects are ready to reduce the transaction time.

I haven't read the code yet, but with a duplication operation, the reading and the writing should probably be part of the same transaction to avoid inconsistency.

What bugs me is that duplication operations can be quite long (like duplicating a complete agency like the STM), so if reading is part of the transaction, it will be long and may block other operations. I was thinking to read current data during duplication and only use a transaction once all the data is ready at the end. Though maybe once it is in the backend, it will turn out to be not so bad after all... we can benchmark then.

(Note that as other operations are migrated to the backend the ServiceUtils's function will receive an additional transaction argument).

greenscientist commented 1 month ago

Add operations to duplicate single or multiple services, with separate function for saving, which allow longer running duplicate operations (like agency or line duplication), to postpone saving when all objects are ready to reduce the transaction time.

I haven't read the code yet, but with a duplication operation, the reading and the writing should probably be part of the same transaction to avoid inconsistency.

What bugs me is that duplication operations can be quite long (like duplicating a complete agency like the STM), so if reading is part of the transaction, it will be long and may block other operations. I was thinking to read current data during duplication and only use a transaction once all the data is ready at the end. Though maybe once it is in the backend, it will turn out to be not so bad after all... we can benchmark then.

(Note that as other operations are migrated to the backend the ServiceUtils's function will receive an additional transaction argument).

So technically, the operation could be quite fast and done in a single SQL command. I don't think that having some "longer" running transaction should be that bad. Our application is not write heavy, so should not block that much. (I would need to review the transaction documentation to be sure)

tahini commented 1 month ago

I'll read more about locking.

My first idea was to read all the data, duplicate it (whence the duplicate without save functions), then take a transaction that (I expected) would lock the tables while the records are being written, as fast as possible.

But maybe it's possible to fine-tune locking, to be able to do it all in a transaction. Duplication does not update any records, it just inserts new ones so it should not lock technically any other operation by other users. If we can ensure that we don't lock other operations, then we could put the whole duplication into a single transaction and do the duplicate and save operation at once.

tahini commented 1 month ago

Everything is in a transaction now. Only the inserted rows will be locked, so even if the duplication operation takes time, it won't block any other operation.

tahini commented 1 month ago

The one-query SQL copy will not give you a unique name for the service, you'll still have to do selects to get and retrieve that name. That done, we'll have to do SQL insert/select queries one record at a time to set the unique name. Unless there's a functionality in postgres to do that that I do not know about.

greenscientist commented 1 month ago

The one-query SQL copy will not give you a unique name for the service, you'll still have to do selects to get and retrieve that name. That done, we'll have to do SQL insert/select queries one record at a time to set the unique name. Unless there's a functionality in postgres to do that that I do not know about.

Yes, it's possible to do a query that will copy the data while updating only a few select fields.

Also going back and updating the name is much less data to transfer than having all the data to send on the pipe.

tahini commented 1 month ago

I know we can insert/select while updating only a few fields, but we still have to know (locally in typescript) the values of those fields, so you still have to fetch the original names/ids, find their unique equivalent, then insert/select each service individually. There will be no fewer queries this way, though a bit less data transferred between DB and client. But services aren't that big anyway.

But for paths, this might be non-negligible. I'll try to keep the data exchanged minimal for the other objects, but we still have to keep the mapping between original IDs and new ones... anyway, just to say that it's not just a dump/simple copy of the original object.

greenscientist commented 1 month ago

That was one of my question: Why do you need the mapping anyway?? That does not sounds really useful.

tahini commented 1 month ago

That was one of my question: Why do you need the mapping anyway?? That does not sounds really useful.

foreign key references. You need the new service ID to copy the schedules, you need the new path IDs to copy the trips, etc.

chairemobilite / transition

Bring the service duplication operation server side #946