FCO / Red

A WiP ORM for Raku
Artistic License 2.0
70 stars 27 forks source link

Make a better way to use multiple databases at once #400

Closed FCO closed 5 years ago

FCO commented 5 years ago

https://colabti.org/irclogger/irclogger_log/perl6?date=2019-09-17#l157

FCO commented 5 years ago

Make it possible to run different code parallelly on different (or not) databases

FCO commented 5 years ago
% perl6 -I. -e '
use Red <red-do>;
model Bla { has UInt $.id is serial; has Str $.name is column }
red-defaults "SQLite", :database<test.db>;

say red-do { Bla.^load: 1 }, { Bla.^load: 2 }, { Bla.^load: 3 }                       

'
(Bla.new(id => 1, name => "bla") Bla.new(id => 2, name => "ble") Bla.new(id => 3, name => "bli"))
% perl6 -I. -e '
use Red <red-do>;
model Bla { has UInt $.id is serial; has Str $.name is column }
red-defaults "SQLite", :database<test.db>;

say await red-do { start Bla.^load: 1 }, { start Bla.^load: 2 }                        

'
(Bla.new(id => 1, name => "bla") Bla.new(id => 2, name => "ble"))
% perl6 -I. -e '
use Red <red-do>;
model Bla { has UInt $.id is serial; has Str $.name is column }
red-defaults "SQLite", :database<test.db>;

say await red-do { start Bla.^load: 1 }, { start Bla.^load: 2 }, { start Bla.^load: 3 }

'
An operation first awaited:
  in block <unit> at -e line 6

Died with the exception:
        Unknown Error!!!
        Please, copy this backtrace and open an issue on https://github.com/FCO/Red/issues/new
        Driver: Red::Driver::SQLite
        Original error: X::DBDish::DBError.new(driver-name => "DBDish::SQLite", native-message => "not an error", code => 1, why => "Error")

    Original error:
    DBDish::SQLite: Error: not an error (1)
      in method handle-error at /Users/fernando/.rakudobrew/versions/moar-2019.03.1/install/share/perl6/site/sources/9FB62DC76EFA166DFBA147ED75C743F9BE8BA042 (DBDish::SQLite::Connection) line 17
      in method prepare at /Users/fernando/.rakudobrew/versions/moar-2019.03.1/install/share/perl6/site/sources/9FB62DC76EFA166DFBA147ED75C743F9BE8BA042 (DBDish::SQLite::Connection) line 26
      in method prepare at /Users/fernando/Red/lib/Red/Driver/SQLite.pm6 (Red::Driver::SQLite) line 47
      in code  at /Users/fernando/Red/lib/Red/Driver.pm6 (Red::Driver) line 22
      in code  at /Users/fernando/Red/lib/Red/Driver.pm6 (Red::Driver) line 21
      in method prepare at /Users/fernando/Red/lib/Red/Driver.pm6 (Red::Driver) line 18
      in submethod TWEAK at /Users/fernando/Red/lib/Red/ResultSeq/Iterator.pm6 (Red::ResultSeq::Iterator) line 14
      in method iterator at /Users/fernando/Red/lib/Red/ResultSeq.pm6 (Red::ResultSeq) line 72
      in method Seq at /Users/fernando/Red/lib/Red/ResultSeq.pm6 (Red::ResultSeq) line 78
      in method do-it at /Users/fernando/Red/lib/Red/ResultSeq.pm6 (Red::ResultSeq) line 83
      in method head at /Users/fernando/Red/lib/Red/ResultSeq.pm6 (Red::ResultSeq) line 316
      in method load at /Users/fernando/Red/lib/MetamodelX/Red/Model.pm6 (MetamodelX::Red::Model) line 381
      in code  at -e line 6

Actually thrown at:
  in block  at /Users/fernando/Red/lib/Red/Driver/SQLite.pm6 (Red::Driver::SQLite) line 43
  in any  at /Users/fernando/Red/lib/Red/Driver/SQLite.pm6 (Red::Driver::SQLite) line 41
  in method prepare at /Users/fernando/.rakudobrew/versions/moar-2019.03.1/install/share/perl6/site/sources/9FB62DC76EFA166DFBA147ED75C743F9BE8BA042 (DBDish::SQLite::Connection) line 39
  in method prepare at /Users/fernando/Red/lib/Red/Driver/SQLite.pm6 (Red::Driver::SQLite) line 47
  in code  at /Users/fernando/Red/lib/Red/Driver.pm6 (Red::Driver) line 22
  in code  at /Users/fernando/Red/lib/Red/Driver.pm6 (Red::Driver) line 21
  in method prepare at /Users/fernando/Red/lib/Red/Driver.pm6 (Red::Driver) line 18
  in submethod TWEAK at /Users/fernando/Red/lib/Red/ResultSeq/Iterator.pm6 (Red::ResultSeq::Iterator) line 14
  in method iterator at /Users/fernando/Red/lib/Red/ResultSeq.pm6 (Red::ResultSeq) line 72
  in method Seq at /Users/fernando/Red/lib/Red/ResultSeq.pm6 (Red::ResultSeq) line 78
  in method do-it at /Users/fernando/Red/lib/Red/ResultSeq.pm6 (Red::ResultSeq) line 83
  in method head at /Users/fernando/Red/lib/Red/ResultSeq.pm6 (Red::ResultSeq) line 316
  in method load at /Users/fernando/Red/lib/MetamodelX/Red/Model.pm6 (MetamodelX::Red::Model) line 381
  in code  at -e line 6
FCO commented 5 years ago

But it works on with Pg:

% perl6 -I. -e '
use Red <red-do>;
model Bla is table<not_bla> { has UInt $.id is serial; has Str $.name is column }
red-defaults "Pg";
Bla.^create-table: :if-not-exists;
say await red-do { start Bla.^load: 1 }, { start Bla.^load: 2 }, { start Bla.^load: 3 }, { start Bla.^load: 4 }, { start Bla.^load: 5 }

'
(Bla.new(id => 1, name => "bla") Bla.new(id => 2, name => "ble") Bla.new(id => 3, name => "bli") Bla.new(id => 4, name => "blo") Bla.new(id => 5, name => "blu"))
FCO commented 5 years ago

I'm thinking of accepting Positionals and not Positionals, so when Positional, it will respect the order. Otherwise, it will try to run everything in parallel...

FCO commented 5 years ago
% perl6 -I. -e '
use Red <red-do>;
model Bla is table<not_bla> { has UInt $.id is serial; has Str $.name is column }
red-defaults pg => \("Pg", :default), sqlite => \("SQLite");
red-do <pg sqlite> => { Bla.^create-table: :if-not-exists };
say red-do (<pg sqlite> => { Bla.^load: 6 }), "sqlite" => { Bla.^load: 7 }, (pg => { Bla.^load: 8 })
'
(Bla.new(id => 6, name => "b") Nil Nil Bla.new(id => 8, name => "c"))
vrurg commented 5 years ago

This question would also involve red-do syntax. This is, perhaps, good time to specify what is expected from it. To my view:

  1. The old good default semantics: red-do { ... }
  2. Multi-database sequential: red-do 'db1' => { ... }, 'db2 => { ... };
  3. Multi-database parallel: similar to the previous.

For readability purposes red-do might accept :with named paramter:

red-do :with<db1>, { ... }, { ... };

:async or :parallel named parameters would result in code blocks being executed asynchronously:

red-do :async, :with<db1>, { ... }, { ... }, ...;
red-do :parallel, 'db1' => { ... }, 'db2' => { ... }, ...;

Convenience alias for the above red-do-async might be considered too.

The internal implementation of the asynchronicity is not relevant. Perhaps race would work well enough, or each block could simply be run within dedicated start – any option would be good. But what is important is that I expect parallelizing to be heavily used in cases when data coming from one (or few) databases should be processed and re-dispatched into another database(s). A user can always create own means of inter-block communication, but Red could provide the most basic approach out of the box for maximum simplicity. I haven't think out the actual syntax and semantics well enough, but here is what I may propose.

Routines red-emit and red-tap are to be provided with the following signature:

sub red-emit(Str:D $name, *%named --> Supplier) { ... }
sub red-tap(Str:D $name, &block, *%named --> Supply) { ... }

Calling any of them would result in creating a new Supplier under the specified $name unless one is already exists. Then this supplier could be used to transfer data between asynchronous blocks. All created suppliers are valid within the execution time of their enclosing red-do and will cease to exists when the routine finishes.

For example:

red-do-async
    'db1' => { red-emit "mytest", fetch-data },
    'db2' => { red-tap "mytest", { process-data $_ } },
    'db3' => { red-tap "mytest", { store-duplicate $_ } },
;

For convenience and syntax sugar red-emit "mytest" => { ... } could also be supported.

Similarly, routines red-send, red-receive could be provided to support queuing via Channel.

A side note: red-do can also support more advanced syntax for implicit asynchronous operation via the following signature:

sub red-do (*%named) { ... }

Then if used like:

red-do db1 => { ... }, db2 => { ... };

It would result in running the blocks in parallel. The point is in not using quotes around database names turning them into named parameters instead of a list of Pairs. In this case manual handling for the named parameters would be required, but generally this could be implemented as simple as:

sub red-do (*%named, *@pos is rw) {
    my @pos;
    my %n;
    for %named.pairs {
        if .value ~~ Code {
            @pos.push: $_
        }
        else {
            %n.append: $_
        }
    }
    samewith(|@pos, |%n, :async)
}
vrurg commented 5 years ago

I seem to be out of sync here, but that because I started writing the comment about 7 hours ago and finished just now. :)

FCO commented 5 years ago

This supplies/suppliers should be shared between red-do’s or should exist only on the red-do where it was created? red-do-async should block or should it return a promise? Why not use a react block? Thank you

Sent with GitHawk

vrurg commented 5 years ago

Suppliers and channels are only for the red-do which creates them. Otherwise it'd be prone for leaking memory problems. One would have to create own, controllable, means of communication for cross-red-do things.

I think red-do must block by default on both sequential and async operations. This is the part of behavior which must not change. But as to red-do-async – it's a good question! Perhaps it's a good idea to have it return a Promise and send its blocks into threads.

FCO commented 5 years ago

@vrurg would you mind to write an example of use to that supply/channel? I couldn't "see" that yes...

FCO commented 5 years ago

I mean, I keep with something like this on my head:

supply {
   red-do <db1 db2> => { whenever start { Model.^all.grep: {...} } -> $result { .emit for $result } }
}
vrurg commented 5 years ago

No, that's too much! Say, we need to duplicate records from one db into many:

red-do :parallel,
db-source => {
    red-emit "dup", $_ for Model.^all
},
db-dest1 => {
    red-tap "dup", -> $record {
        Mode.^create: $record
    }
},
db-dest2 => {
    red-tap "dup", -> $record {
        Mode.^create: $record
    }
}

And that's all. For this to work you would need to create a Supplier for dup and then re-use it.

BTW, note that tap is an async co-routine. So, basically, the above example doesn't even need :parallel unless receivers plan to do some additional work outside of the tap.

FCO commented 5 years ago
red-do :parallel,
db-source => {
    red-emit "dup", $_ for Model.^all
},
<db-dest1 db-dest2> => {
    red-tap "dup", -> $record {
        Mode.^create: $record
    }
}

Maybe this way we avoid to duplicate code...

vrurg commented 5 years ago

I don't think this would be often needed. I've done it this way just to make the example clean. In real life it would be something like:

sub store-record ($record) { ... }
red-do :parallel,
...
db-dest1 => &store-record,
db-dest2 => &store-record,

or, if it happens to be many destinations:

my %p = @all-dests.map: * => &store-record;
red-do :parallel, |%p
...

I see no need to overcomplicate in this case.

FCO commented 5 years ago

@vrurg now this is possible (the pg was pre populated):

% perl6 -I. -e '                                          
use Red <red-do>;
model Bla is table<not_bla> { has UInt $.id is serial; has Str $.name is column }
red-defaults pg => \("Pg", :default), sqlite => (my $sqlite = database("SQLite"));
red-do <pg sqlite> => { Bla.^create-table: :if-not-exists };

red-do :async,
    {
         red-emit "dup", $_ for Bla.^all
    },
    :sqlite{
        red-tap "dup", -> $record {
            $record.^save: :insert
        }
    },
;
red-do
   $sqlite => { say "sqlite => ", $_ for Bla.^all.batch(3).head },
   "pg"    => { say "pg     => ", $_ for Bla.^all.batch(3).head },
'
sqlite => Bla.new(id => 1, name => "bla")
sqlite => Bla.new(id => 2, name => "ble")
sqlite => Bla.new(id => 3, name => "bli")
pg     => Bla.new(id => 1, name => "bla")
pg     => Bla.new(id => 2, name => "ble")
pg     => Bla.new(id => 3, name => "bli")
FCO commented 5 years ago

Maybe we should have a supply on the Driver that would emit everything that happens on that driver... and the user could grep what it wants.

red-do {
    .events.grep({ .event-type eq "create" and .model === MyModel }).tap: $logger.debug: "Created: { .obj }"
}

and it would create a log entry ($logger is just an example) for every new created MyModel object.

Maybe the event class could be something like:

class Event {
    has enum <create delete update> $.event-type;
    has Red::Model:U $.model;
    has Red::Model $.obj;
    has %.change;
}

or maybe it should be a model to make it possible to the user store it on a database...

vrurg commented 5 years ago

It's something I would expect to be quite appreciated in a big production. I would add that perhaps it also makes sense to have supply of events for all active drivers allowing for unified processing of everything. Whoever needs just one stream – subscribes to individual driver. Whoever needs everything could use $*RED-EVENTS or alike.

With regard to the Event class – I don't like the idea of limiting the event types in either way. Perhaps it'd make more sense of passing the AST node bound to the event? Or, let's try to generalize it this way: unless I'm mistaken, any event has an object associated with it. No matter what kind of object it is as the system could emit AST for pre-execute event, and could emit a model object for post-execute or a Failure if execute fails. In this case it'd do more sense to replace fir three attributes with just $.object and by introspecting the object we could determine what kind of even is received.

A downside of this approach is possible performance cost, though it shouldn't be very influential in a parallel model as even emission and processing could take place in their own, likely shared, thread. But a way to set what kind of events a consumer is interested in would undoubtedly be reasonable to have.

No idea what is %.change is responsible for.

PS. It feels like loading you with extra work. ;)

FCO commented 5 years ago

I really like the idea of passing the AST. I’m thinking of the unified supply to be called Red.events. Maybe, red-emit should emit on the driver’s supply and .red-tap: $tap-name, ... should tap Red.events.grep({.db === $driver &&.name eq $tap-name }). But I think we need to have a standard event class on it to make it easier to .grep what you want. And red-emit would be the way to automatically creating that class... so now I’m thinking on:

class Event {
    has Red::Driver $.db;
    has Str         $.db-name;
    has Str         $.driver-name;
    has Str         $.name;
    has             $.data;
}

And on automatic events .data would contain it the AST and on red-emit ones it would contain anything passed.

About the %.change It would contain the previous and the new values when it was a update... and that’s something I couldn’t find a way to add on this new way (the update AST has only the new values)

FCO commented 5 years ago

Or maybe:

class Event {
    has Red::Driver $.db;
    has Str         $.db-name;
    has Str         $.driver-name;
    has Str         $.name;
    has             $.data;
    has Red::AST    $.ast;
    has Red::Model  $.object;
    has X::Red      $.error
}
FCO commented 5 years ago

And ‘red-tap “bla”, {...}` should tap:

Red.events.grep(*.name eq “bla”).map(*.data)

And that would not need to be restricted to a single red-do

vrurg commented 5 years ago

I could have been not clear enough, but I really didn't meant to get rid of Event (eh, Red::Event, I guess?) class. I was only about removing redundant attrs. From this point of view the last class variant is overloaded. To my view $.data should be sufficient. Upon receiving an event one would just:

given $event.data {
    when Red::Ast { ... }
    when Failure { ... } # or X::Red
    ...
}

The original model object which was the source for an UPDATE could be passed as an additional attribute. Something like $.origin. Final event could be then:

class Event {
    has Red::Driver $.db;
    has Str         $.db-name;
    has Str         $.driver-name;
    has Str         $.name;
    has             $.data;
    has Model       $.origin;
}

Using the system-wide event supplier for red-emit and red-tap makes it actually unnecessary to enclose named data streams within one red-do. And perhaps it's even better this way because it would then be possible to start a separate thread with its own red-do and attach to a data stream fulfilled by the mainline, for example.

The original point of limiting data streams by enclosing red-do was only to prevent memory hog by multiplying Supply/Supplier objects. In your approach it's not needed because a single Supplier provides service for all subscribers.

One thing which worries me a bit is use of share $.name. I.e. if Red is using events for something and binds to a name then this name must be a reserved one and not available to a user. To minimize possible collisions I'd propose reserving whole Red:: namespace. I.e.:

red-emmit "Red::MyStream", $data;
FCO commented 5 years ago

Started:

fernando@MBP-de-Fernando Red % perl6 -Ilib -MRed -e '
red.events.tap: -> $ev { say "red => ", $ev }
my $*RED-DB = database "SQLite";

model Bla { has UInt $.id is id }
Bla.^create-table: :if-not-exists

'
red => Red::Event.new(db => Red::Driver::SQLite.new(database => ":memory:", events => Supply.new), db-name => "Red::Driver::SQLite", driver-name => Str, name => Str, data => Red::AST::CreateTable.new(name => "bla", temp => Bool, columns => Array[Red::Column].new(Red::Column.new(attr => bla.id, attr-name => "id", id => Bool::True, auto-increment => Bool::False, references => Callable, nullable => Bool::False, name => "id", name-alias => "id", type => Str, inflate => { ... }, deflate => { ... }, computation => Any, model-name => Str, column-name => Str, require => Str)), constraints => Array[Red::AST::Constraint].new(), comment => Red::AST::TableComment), model => Bla, origin => Red::Model, error => Exception)
fernando@MBP-de-Fernando Red % perl6 -Ilib -MRed -e '
red.events.tap: -> $ev { say "red => ", $ev }
my $*RED-DB = database "SQLite";

model Bla is table<---invalid---> { has UInt $.id is id }
Bla.^create-table: :if-not-exists

'
red => Red::Event.new(db => Red::Driver::SQLite.new(database => ":memory:", events => Supply.new), db-name => "Red::Driver::SQLite", driver-name => Str, name => Str, data => Any, model => Bla, origin => Red::Model, error => X::Red::InvalidTableName.new(table => "---invalid---", driver => "Red::Driver::SQLite"))
'---invalid---' is an invalid table name for driver Red::Driver::SQLite
  in method create-table at /Users/fernando/Red/lib/MetamodelX/Red/Model.pm6 (MetamodelX::Red::Model) line 258
  in method create-table at /Users/fernando/Red/lib/MetamodelX/Red/Model.pm6 (MetamodelX::Red::Model) line 253
  in block <unit> at -e line 6

I've added the error field because it would be good to have the AST and the error in case of error.

FCO commented 5 years ago

I've removed red and now Red has the events method

FCO commented 5 years ago

We'll need a place to add the bind information of the AST...

vrurg commented 5 years ago

As much as I thought about it, Event Is requiring another attribute for this.

FCO commented 5 years ago

Maybe a Red::Event::AST and a Red::Event::Generic? That way we could use the type to filter...

class Red::Event {
    has Red::Driver $.db;
    has Str         $.db-name;
    has Str         $.driver-name;
}
class Red::Event::AST is Red::Event {
    has Red::AST   $.ast;
    has            @.bind;
    has Red::Model $.orig;
}

class Red::Event::Generic is Red::Event {
    has $.data;
}

Sent with GitHawk

vrurg commented 5 years ago

Does it really makes sense multiplying entities here? @.bind would just remain uninitialized for non-AST events. My point is that with your proposal it would be necessary to do two-level check: first – for AST/Generic type; then, for Generic, test for data type.

Or, if you don't want to pollute pure Event with extra attributes, perhaps more elegant would be to have Red::Event::Bindings role and mix it into a new Event object for AST. I.e. like:

my $event  Red::Event.new(data => $ast, ...) but Red::Event::Bindings[@binds];

Though its a bit of complication from inside, for a user it'd be clear that if $ev.data ~~ Red::AST then $ev.bind is there for inspection because $ev ~~ Red::Event::Bindings too.

Besides, this approach is easy to extend for other data types if necessary.

FCO commented 5 years ago
use Red;
Red.events.tap: -> $ev { say $ev.data }
my $*RED-DB = database "SQLite";

model Bla {
    has UInt $.id is id;
    has Str $.name is column is rw
}

Bla.^create-table: :if-not-exists;
my $bla = Bla.^create: :name<test>;
$bla.name = "bla";
$bla.^save;
Bla.^all.grep({ .id < 10 and .name.starts-with: "bla" }).Seq;
sleep 5
Red::AST::CreateTable:
    bla.id
    bla.name
Red::AST::Insert:
    name
    id
Red::AST::Select:
    (Bla)
    Red::AST::Eq:
        bla.id
        1
Red::AST::Update:
    name
Red::AST::Select:
    (Bla)
    Red::AST::AND:
        Red::AST::Lt:
            (bla.id)::num
            10
        Red::AST::Like:
            bla.name
            bla%
vrurg commented 5 years ago

BTW, how hard is it to get actual SQL from AST for an end-user?

FCO commented 5 years ago

Just question of calling the driver's translate method passing the AST.

And I was wrong... We do not store the bindings... it's just a thing after the AST translation.

FCO commented 5 years ago
% perl6 -Ilib -e 'use Red <red-do>;
Red.events.tap: { dd .db.translate: .data }
red-defaults :default(database "SQLite");
red-do {
model Bla { has UInt $.id is id; has Str $.name is column is rw }
 Bla.^create-table: :if-not-exists;
 my $bla = Bla.^create: :name<test>;
 $bla.name = "bla";
 $bla.^save;
}
sleep 5'
("CREATE TABLE bla(\n   id integer NOT NULL primary key ,\n   name varchar(255) NOT NULL \n)" => [],)
"INSERT INTO bla(\n   name\n)\nVALUES(\n   ?\n)" => ["test"]
"SELECT\n   bla.id , bla.name \nFROM\n   bla\nWHERE\n   bla.id = 1\nLIMIT 1" => []
"UPDATE bla SET\n   name = 'bla'\nWHERE bla.id = 1\n" => []
vrurg commented 5 years ago

Just perfect for human-readable logging and possibly additional debugging!

FCO commented 5 years ago

Maybe I should change the $*RED-DEBUG to use that...

Sent with GitHawk

vrurg commented 5 years ago

Perhaps. But debug could require more than just dumping database events. Debug stream might include intermediate messages of state changes, dumps of local variables or whatever else could be considered useful. Also, file names and line numbers for the purpose of distinguishing identical messages.

I can't say much in this area because I don't really know what exactly debugging does in Red. But my guess would be that additional debug event would be useful. The event could be emitted by a debug method. Something like:

role Red::Event::Debug {
    has Str $.file;
    has Int $.line;
    has $level; # debug level, could be of an enum type
}
method debug($level, Str:D $msg) {
    Red.emit: Red::Event.new(:data($msg)) but Red::Event::Debug[$caller-file, $caller-line, $level]
}

The good thing about incorporating debugging into event subsystem is that it allows a user to plug in any reporting mechanism he/she likes. For example, events could be redirected to a WebSocket as JSON and be used by a front-end developer to observe what's going on in the backend.

PS. That's funny that I also thought about events partially duplicating what debug does. But didn't want to overload you with something beyond the current task. But it was too much on the surface to get around... ;)

FCO commented 5 years ago

now, every db meta-method can receive :with<db> passing the database to be used, and ResultSeq has a .with method to set the database that should be used.