goldmansachs / reladomo

Reladomo is an enterprise grade object-relational mapping framework for Java.
Apache License 2.0
380 stars 92 forks source link

Caching with Relationships #173

Closed spudmcq closed 6 years ago

spudmcq commented 6 years ago

I want to completely disable caching - I know this will be a performance hit but our table sizes are small and we'll re-enable the caching once we configure the TCP Notification Server and have tested it in our environment/ with redundancy etc.

So, to disable the caching I set cacheType = none as follows and sure enough I see all application finds going to the db (sql logs and profiler show that)

 <MithraObjectConfiguration
  className="com.greenhedges.Account" cacheType="none"/>

 <MithraObjectConfiguration
  className="com.greenhedges.AccountAgreement" cacheType="none"/>

My issue is the one-to-many relationship are not refetched/queried from the db each time - e.g. say an Account can have n number of AccountAgreements:

<MithraObject objectType="transactional">
<PackageName>com.greenhedges</PackageName>
    <ClassName>Account</ClassName>
    <DefaultTable>account</DefaultTable>
...
    <Relationship name="aggreements" relatedObject="AccountAgreement" cardinality="one-to-many" relatedIsDependent="true">
         AccountAgreement.id = this.id
    </Relationship>
</MithraObject>

Is there a way to force the relatedObjects to be reloaded always?

mohrezaei commented 6 years ago

There is no such thing as "completely disable caching". A cache is defined as a temporary copy. As soon as the bytes leave the database server, you're facing a temporary copy, aka a cache. So the only valid outlook is "what is the contract of the cache?" (because, again, for emphasis, there is always a temporary copy, aka cache).

The various cache configurations change the contract. It sounds like you want a rather odd contract: if you have an object with an attribute "foo" and another attribute "bar", with corresponding getFoo()/getBar() methods, for some foos ("relationships"), you want to hit the database but for some bars (not "relationships") you don't. This inconsistency will cause your code to do really bad things. How many database calls do you want in this piece of code?

int sum = 0;
for(int i=0;i<order.getItems().size();i++) {
    sum += order.getItems().get(i).getCount();
}

CacheType="none" is doing what you said: "all application finds going to the db". Calling a get method is not a find. CachType="none" is also meant as a last resort and temporary measure. Every time you use it, you incur technical debt.

Setting up the notification server is by far the easiest way to deal with this, because of the consistency it brings to your code in treating method calls on your objects.

If you really want to shoot yourself in your foot, have a look at the cacheTimeToLive and relationshipCacheTimeToLive configuration options.

P.S. The performance hit has little to do with the table size. It's the wire latency that's the real issue.

spudmcq commented 6 years ago

Perhaps i didn't explain my issue clearly or I'm not understanding the timing of query issuing in reladomo.

I want the full graph of Account loaded each time Account is found from the db - just like the first time i do a find, it loads both the the Accounts and Account Agreements by issuing two queries.

Whenever an Account instance is retrieved from the db the list of AccountAggreements should also be fetched again. This is what happens the first time I find the instance from the database. On subsequent finds the second query to the related AccountAgreement table is not issued.

for your snippit of code i would not expect any queries to the db but for

  Account account = AccountFinder.findByPrimaryKey(id, validDateTimestamp,
            transactionDateTimestamp);

I'd expect a find from the Account Table and the AccountAgreement table to be run in order to retrieve the Account Object.

Am I thinking about this the wrong way? If I do the find in a transaction the Account Agreements would have to be fetched wouldn't they?

I will setup the TCP Notification Service but i would like to understand the whats wrong with the way i'm thinking about the find and the relationship

mohrezaei commented 6 years ago

Eager graph loading in Reladomo is done via deep fetch on the list object, which means you can't really use the findByPrimaryKey convenience method if you want to load a graph eagerly. It's not hard to use a list even if you want to load one object with some of its graph:

AccountList accountList = AccountFinder.findMany(someOp);
accountList.deepFetch(AccountFinder.agreements());
Account a = accountList.get(0);

When you use the list to eagerly fetch part of the graph, you're effectively forcing a refresh in scenarios where the data may have changed in the db.

Deep fetching is always local to the piece of code: there is no way to globally change the behavior of calling a find (e.g. always forcing a certain relationship to be loaded eagerly). We believe forced graph loading in all contexts is an anti-pattern.

Most of the time, when you want to use a single object, there is no difference between eager and lazy loading with some exceptions:

Just to complete the picture, when you have an account object (regardless of whether you got it from a list with a deep fetch or findByPrimaryKey), a relationship traversal (calling account.getAgreements()) will hit the cache if it can, and then the database if it must. In other words, relationships will lazily fetch data from the database if they must, but they are not affected by the cacheType="none" configuration (because then the piece of code I pasted would do very bad things). Another way to think about this is that the eager fetching via deep fetch is just priming the cache, not sewing a graph.

As you've said, transactions don't trust the cache at all, until they load something in that transaction from the database (which is really part of the locking scheme, with data refresh as a side effect). As far as eager/lazy fetching, the transactional context doesn't make a difference (other than knowing when the cache is trustworthy).

A lot of this behavior is also predicated by the "uniquing" feature of Reladomo: each persistent object, identified by its PK, is guaranteed to have reference stability across the JVM.

spudmcq commented 6 years ago

Thanks for the detailed answer. You've convinced me that my best option is to setup the notification server. It looks pretty straightforward - I've reviewed the docs here.

Can different applications backed by totally different databases share the same notification service if they set their DatabaseIdentifier differently or must I run a notification server for each separate application?

mohrezaei commented 6 years ago

Sharing the notification server is fine.

kutabale commented 6 years ago

Can notification server run as part of JVM that is using database. Can each instance run with its own notification server?

kutabale commented 6 years ago

Notification doesn't help when one or more of your processes are not Reladomo JVM based? If i have a sprinboot application that is using reladomo do i still need the Notification server?

mohrezaei commented 6 years ago

All instances that read/write to the same database must share the same server. The server can run in the same JVM as one of the instances, so long as that instances is up all the time.

kutabale commented 6 years ago

Have you seen reladomo being used in mircroservice architecture? If so, what are the most common deployment topologies that you have seen in regards to notification server and reladomo in general?

Thanks.

mohrezaei commented 6 years ago

Whether using a microservices architecture or not, you should think of your domain as a singular, coherent API. That means defining your domain in one place. Avoid defining the same thing in multiple places and avoid splitting your domain (especially where you'd be breaking natural relationships).

Your services can then use that consistent domain to serve their logic.

From a topology perspective, think of the domain as a single jar that you deploy to all your instances/services and have a single notification server for that cluster of services that use the same objects.