Open GoogleCodeExporter opened 8 years ago
No change *whatsoever* has been made to the query use of the cache in GAE
plugin v2.x (using DataNucleus v3) relative to GAE plugin v1.x (using
DataNucleus v1). No idea where DataNucleus 2 comes into this since it was never
used by any GAE plugin. The L2 cache is obviously also used on getObjectById
calls, its primary use-case, giving significant benefit.
Obviously the alternate route you propose could check if no ordering/result
(anything else needed?) and if so then do a keys-only query, try them in the
cache, and any not present would have to go back to the datastore and get
those. That would be doing 2 datastore ops relative to the current 1, though
*may* be quicker on some cases. You could contribute such a thing since
seemingly a priority on your project.
Original comment by googleco...@yahoo.co.uk
on 1 Dec 2012 at 7:01
Thanks for the quick response, and thank you for the clarification.
First off, sorry about the confusion with DataNucleus versions, I was thinking
the old GAE plugin used DataNucleus v2, when of course it was v1. From what you
say there must be something else I've done that has caused the increased number
of reads.
I could probably make changes to my application to encourage better cache usage
- such as using the built in unmanaged relationships, though there are a lot of
occasions where I only want a subset of the children (for example, filtered by
date).
I do think a keys-only fetch could be beneficial for queries when L2 cache is
employed. While it does result in an extra datastore op, surely underlying
actions are the same - just split over two calls. I may be wrong, but I would
guess that the full fetch will fetch the keys from the index first, before then
going ahead and fetching the entities - all we'd be doing is breaking this into
two separate calls in order to check the cache in between. Perhaps a little
more thought is needed, as I know that the "Next" calls (which I presume is
where a lot of the entity fetching happens) happen concurrently while the cache
is being checked, and the client is processing records.
I'm not sure that ordering would pose a problem. Wouldn't the keys be fetched
in the defined order, providing the appropriate index is present?
Of course, it would introduce an additional small operation. However, it it
saved at least 1 read for every 7 additional small ops, then it pays off. This
depends on workflow, so I suppose that it would ideally be configurable.
I might have a look at implementing it. I'm completely unfamiliar with
DataNucleus, so it will be a steep curve, but I'd like to contribute this if I
can.
Thanks again for the helpful response.
Original comment by mar...@longhome.co.uk
on 1 Dec 2012 at 6:20
Firstly you're right about ordering not being important here, was thinking of
something else.
Secondly the easiest way to try it out is to add a query extension to enable a
query using "keys-only", and then update the code to check for the extension
(so by default people won't see this, and can change the default in the future
if it gets to the point of being universally applicable). You only really need
to know the code under com.google.appengine.datanucleus.query to do this. In
DatastoreQuery look for DatastoreManager.QUERYEXT_EXCLUDE_FROM_TXN for an
example of a query extension, so do the same for yours. Then under "normal
query" you enable keysOnly when that extension is set, and finally when your
extension is set you create an alternative to StreamingQueryResult that takes
in the query result.
As you saw, the iteration process is complicated since it only currently loads
objects when next is called on the query result iterator, hence caters for
large query results. A simple alternative would be to load all query results up
front, firstly getting all objects present in the cache, then for all not
present doing a single call to ds.get(keys) and then convert each one-by-one to
a Pojo (EntityUtils.entityToPojo). Obviously loading all is "non-optimum" but
gives a chance to see if it values further work. The method
"ExecutionContext.getObjectsFromCache(ids)" is only present in DataNucleus core
3.1.4-SNAPSHOT and later (not in earlier versions). Once you have all objects
you just return a query result wrapping them.
Original comment by googleco...@yahoo.co.uk
on 2 Dec 2012 at 9:04
Thanks for the points. I've been having a look around and what you say makes
sense. I see that the asynchronous fetches are handled by the datastore async
api, so that is all dealt with.
I though I'd just try implementing this in my application to start with, to see
what benefit I get, and I seem to have hit a problem (and possibly found the
reason that my reads have jumped up). I do a datastore async keys-only query in
my code, and then loop through the results calling
pm.getObjectById(PlanCompletion.class, e.getKey().getId()). So, this should
take advantage of the cache if the object is in it. The code seems to be
calling memcache.Get(), but then datastore_v3.Get() immediately after for each
entity.
The log for each item looks like this:
0:1354487700.542000 org.datanucleus.ObjectManagerImpl getObjectFromLevel1Cache: Object with id "com.rundouble.rpc.PlanCompletion:1191412" not found in Level 1 cache [cache size = 3]
:
0:1354487700.600000 org.datanucleus.ObjectManagerImpl putObjectIntoLevel1Cache: Object "com.rundouble.rpc.PlanCompletion@1ad4feb" (id="com.rundouble.rpc.PlanCompletion:1191412") added to Level 1 cache (loadedFlags="[NNNNNNNNNYNNNNNNNNNNNNNNNNN]")
:
0:1354487700.600000 org.datanucleus.state.JDOStateManager wrapSCOField: Object "com.rundouble.rpc.PlanCompletion@1ad4feb" (id="com.rundouble.rpc.PlanCompletion:1191412") is having the value in field "completed" replaced by a SCO wrapper
:
0:1354487700.601000 org.datanucleus.ObjectManagerImpl getObjectFromLevel2Cache: Object with id="com.rundouble.rpc.PlanCompletion:1191412" taken from Level 2 cache (fields="[0, 1, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 17, 16, 19, 18, 21, 23, 22, 25, 24, 26]", version="") - represented as "com.rundouble.rpc.PlanCompletion@1ad4feb"
:
0:1354487700.601000 org.datanucleus.state.LifeCycleState changeState: Object "com.rundouble.rpc.PlanCompletion@1ad4feb" (id="com.rundouble.rpc.PlanCompletion:1191412") has a lifecycle change : "P_CLEAN"->"P_NONTRANS"
:
0:1354487700.601000 org.datanucleus.ObjectManagerImpl evictFromTransaction: Object "com.rundouble.rpc.PlanCompletion@1ad4feb" (id="com.rundouble.rpc.PlanCompletion:1191412") being evicted from transactional cache
:
0:1354487700.601000 org.datanucleus.ObjectManagerImpl evictFromTransaction: Object "com.rundouble.rpc.PlanCompletion@1ad4feb" (id="com.rundouble.rpc.PlanCompletion:1191412") is not transactional
:
0:1354487700.601000 org.datanucleus.store.connection.ConnectionManagerImpl allocateConnection: Connection found in the pool : com.google.appengine.datanucleus.DatastoreConnectionFactoryImpl$DatastoreManagedConnection@cc3e8 for key=org.datanucleus.ObjectManagerImpl@96e599 in factory=ConnectionFactory:nontx[com.google.appengine.datanucleus.DatastoreConnectionFactoryImpl@11da53]
:
0:1354487700.601000 com.google.appengine.datanucleus.EntityUtils getEntityFromDatastore: Getting entity of kind PlanCompletion with key PlanCompletion(1191412)
:
I was wondering if it was throwing it out because of a transaction, or
something. I did open a transaction earlier in the http request. So I've tried
both creating a new PersistenceManager, and
pm.currentTransaction().setNontransactionalRead(true);
I'm baffled why this keeps going to the datastore. Could this be a bug?
I'm using datanucleus 3.1.1 with the appengine plugin 2.1.1 (as shipped with
the GAE API 1.7.3). My jdoconfig.xml is:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<jdoconfig xmlns="http://java.sun.com/xml/ns/jdo/jdoconfig"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://java.sun.com/xml/ns/jdo/jdoconfig">
<persistence-manager-factory name="transactions-optional">
<property name="javax.jdo.PersistenceManagerFactoryClass" value="org.datanucleus.api.jdo.JDOPersistenceManagerFactory"/>
<property name="javax.jdo.option.ConnectionURL" value="appengine"/>
<property name="javax.jdo.option.NontransactionalRead" value="true"/>
<property name="javax.jdo.option.NontransactionalWrite" value="true"/>
<property name="javax.jdo.option.RetainValues" value="true"/>
<property name="datanucleus.appengine.singletonPMFForName" value="true"/>
<property name="datanucleus.appengine.autoCreateDatastoreTxns" value="false"/>
<property name="datanucleus.cache.level2.type" value="jcache"/>
<property name="datanucleus.cache.level2.cacheName" value="jdocache"/>
<property name="datanucleus.appengine.datastoreEnableXGTransactions" value="true"/>
</persistence-manager-factory>
</jdoconfig>
I've been stuck all evening trying to figure this out, but I'm at a loss. I
don't really want to try implementing this in the plugin until I get a good
idea how much difference it will make.
Thanks
Original comment by mar...@longhome.co.uk
on 2 Dec 2012 at 11:29
Suggest you familiarise yourself with the JDO spec; getObjectById(id) is the
same as getObjectById(id, true) which requires the implementation to validate
the existence in the datastore of that object (regardless of whether it came
from some cache). You can remove that validation check by calling
getObjectById(id, false) or by setting
"datanucleus.findObject.validateWhenCached" to false.
Original comment by googleco...@yahoo.co.uk
on 3 Dec 2012 at 9:44
D'oh! Worst thing is I actually read that a few months back. Sorry for the
confusion.
Indications seem to be that it doesn't make a huge difference in query time.
The keys_only seems to be marginally faster when the cache is hot - I've yet to
do tests with a cold cache.
Cost wise, there is a huge benefit when the cache is hot, but then that was
calculable and I think we knew that. This particular test which fetches about
100 entities goes from a cost of 9450 to 1650 micropennies, which could be a
big deal to someone who's biggest cost area is datastore ops, and who gets a
high rate of memcache hits.
Original comment by mar...@longhome.co.uk
on 3 Dec 2012 at 10:17
Original issue reported on code.google.com by
mar...@longhome.co.uk
on 30 Nov 2012 at 9:53