RobThree / MongoRepository

Repository abstraction layer on top of Official MongoDB C# driver
https://www.nuget.org/packages/MongoRepository/
MIT License
307 stars 141 forks source link

Inner Join or EF Include equivalente #15

Closed EnderXXX closed 9 years ago

EnderXXX commented 9 years ago

Hi, I'm wondering how to be able to join two or more Repositories result by common properties (normally foreign pks) using this wonderful library. As fas as I know it could be done with EF and SQL but I am very happy with NoSQL.

As MongoRepository seems to implement IQueryable, do you think that something like the next code could work? (if in a fluent way were possible even better^^) thanks!

"from w in repo.Query() join e in repo.Query() on.."

RobThree commented 9 years ago

Just as with other (you could say, "more classical") databases you need to optimize the way data is stored to your needs but also in such a way that it is not an unmaintainable mess of (redundant) data everywhere. So it really depends on your situation on what the best (or better: best balanced) solution would be. Sometimes a bit of redundancy is good either for performance, flexibility or other reasons and sometimes reduncancy is actually required (when you change a customer's address you don't want (histrical) invoices to change address too).

You can imagine a collection of customers and a collection of products and a collection of shopping carts that contains a link between a customer and one or more products. This is fairly close to what you'd do in an RDBMS and, indeed, use "joins" for to gather the information.

In Mongo's world, however, you would more likely have a collection of products and customers and as part of the customer you'd store a shoppingcart (or: N-products in a cart group) as part of the customer object itself. This would only have 'references' to the products in the respective collection. That would mean "one less join".

Mongo does have concepts similar to joins (and the, to be released, 3.2 version promises even more "alike" concepts) but the general idea is that an "extra roundtrip to the server" isn't that bad in document oriented systems like Mongo as it is for "classic RDBMS'es" (and please don't quote me on this, I'm trying very hard not to over/understate things or make it too black-and-white; it's actually in all database models very much more of a grey area than it may sound in what I'm saying).

So you can, already, do joins as Linq provides on IQueryables, however, they may not do exactly what you think they do (but that al very much depends on the implementation in, amongst others, the underlying driver).

I have tried to demonstrate below:

using MongoRepository;
using System.Collections.Generic;
using System.Linq;

class Customer : Entity
{
    public string Name { get; set; }
    public List<string> Cart { get; set; }
}

class Product : Entity
{
    public string Description { get; set; }
    public decimal Price { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        var custrepo = new MongoRepository<Customer>();
        var prodrepo = new MongoRepository<Product>();

        // Set up some entities:
        var soap = new Product { Description = "Hand soap", Price = 1.23m };
        var milk = new Product { Description = "Milk", Price = 1.99m };
        var chips = new Product { Description = "Chips", Price = .99m };
        var bread = new Product { Description = "Bread", Price = 2.10m };
        prodrepo.Add(new[] { soap, milk, chips, bread });

        // Now a customer comes along...
        var rob = new Customer { Name = "Rob Janssen" };

        // ...and adds some products into his cart
        rob.Cart = new List<string>() { soap.Id, bread.Id };
        custrepo.Add(rob);

        // Essentially, now, when we retrieve a customer we also get the contents of the cart; 
        // no join required... but the products are stored as "references"
        var cust = custrepo.Where(c => c.Name == "Rob Janssen").First();

        // So let's retrieve all products in the customer's cart using the references:
        var products = from cartitem in cust.Cart
                       join product in prodrepo on cartitem equals product.Id
                       select product;
    }
}

This demonstrates a 1:N stored as part of the customer (namely the cart's contents) but also a 1:N stored separately (requiring "a join", namely the products).

You could go as far as storing the products (or a copy of a product) in the cart:

// Assume same customer/product classes as above...
class Program
{
    static void Main(string[] args)
    {
        // Assume same setup code as above...

        // Now a customer comes along...
        var rob = new Customer { Name = "Rob Janssen" };

        // ...and adds some products into two carts. 
        // Notice we store actual (copies of) entities, not references/id's
        rob.Cart = new List<Product>() { soap, bread };
        custrepo.Add(rob);

        // Essentially, now, when we retrieve a customer we also get the contents of the cart;
        // no join required.
        var cust = custrepo.Where(c => c.Name == "Rob Janssen").First();
    }
}

However, you can see that this results in some 'duplication' you probably don't want. Then again; maybe you do. This totally depends on the situation, use-case etc.

As to the "they may not do exactly what you think they do": the above join(s) result in the join being executed client-side (e.g.: the contents of the collection is pulled over the wire and the join is executed in your application). This is not very effective, especially if you have a large dataset.

What you probably want is something like:

var products = prodrepo.Where(p => cust.Cart.Contains(p.Id)).ToArray();

This (maybe rather counter-intuitive) query results in a much more efficient "in" query that only retrieves the desired objects:

{
  "op" : "query",
  "ns" : "DemoDB.Product",
  "query" : {
    "_id" : {
      "$in" : [ObjectId("563c5b546c39371eb84f2c80"), ObjectId("563c5b546c39371eb84f2c83")]
    }
  },
  //...
}

You may want to read:

Long story short: you probably need to get used to thinking in a "document store" fashion rather than "RDBMS" fashion.

Wether you write joins as:

var products = from cartitem in cust.Cart
               join product in prodrepo on cartitem equals product.Id
               select product;

or

var products = cust.Cart.Join(prodrepo, 
                    cartitem => cartitem, product => product.Id, 
                    (cartitem, product) => product);

Is just a matter of personal taste and both are nothing else than different flavours of syntactic sugar that boils down to the same code and query executed.

RobThree commented 9 years ago

You're welcome!