Open snazy opened 11 years ago
Are you sure this is not an astyanax ticket? mgr.flush eventually calls CassandraSession.sendChangesImpl which calls persist, remove, persisteIndex and removeIndex and removecolumn. Those methods do nothing except build up a mutation which on line 137 of CassandraSession is called to execute and is only a write. There should be no reads whatsoever. Can you put a debug point at 137 and step over and see if that line is causing it? If so, that is an astyanax bug I believe then and needs to be filed with them? (They fixed the other bug I reported actually but it takes them a bit sometimes though this one should be a high priority I would think).
oh, are you talking about when persist is called or when flush is called? persist should neither write nor read from server at all. If you are talking about persist, that is definitely a playorm bug. If you are talking about flush, I suspect astyanax then as I see no reads taking place.
It happens when flush is called. I'll retry that tomorrow and debug into the code. I just saw that performance degraded when the code ran against a remote cluster instead of the local machine (which naturally has nearly no latency).
It is PlayORM.
Stack trace is like this (from top to bottom)
in put(Object):
so basically on the put not on the flush. Hmmmmm, I had not completely thought that one through. You are just inserting entities, correct? (not updating). There is a put(Object, boolean) where if you do put(entity, true) it should not do read. I am thinking we should reverse this so the default behavior of put is never to read but put(entity, false) will incur a read ONLY if needed (ie. if the entity was already read, it would not incur a read).
ps. this is all for indexing...to properly update indexes on an update, we have to run one delete index val column and one add index val column. If anyone does an update without reading the entity first, the index will only add the new value and not delete the old one.
I just call put(Object) (not put(Object,boolean)) because I do not know whether the row existed before. There are no indexes in these entities.
The "needRead==true" is caused, because I just used the @NoSqlId annotation without any parameters. This implies that the "NoSqlId.usegenerator" is true (the default). It's nice, that the ID is auto-generated by default. But the find-call doesn't make sense, if the ID is set and there are no indexes to update.
I've changed my @NoSqlId annotations to use "usegenerator=false" and the roundtrips are gone.
hmmm, this if statement might be wrong in the put method. Let me talk out loud and together we can probably work this out. The idea is
that said a simple if(!(entity instanceof NoSqlProxy)) breaks alot of tests that expect an autogenerated id(but the id would be null during a read) or anotherwords IF you are autogenerated AND you are null, we know this must be an insert not an update.
A decent test case to step through is TestManyToOne.testIndexWithToOne and remove the "idField.isAutoGen() && id != null" and that test case breaks.
There is another way though too. If you do a batch read first passing in 20 entity ids, it reads in parallel not in series which is way faster. A list is returned where N of the 20 entities exist and M do not exist where N+M = 20 and as you iterate over the 20, you can grab the ones you read and create new ones and do a batch put then which will avoid all the serialized reads which can be a penalty.
oh, and let me know what you think before I proceed with anything here.
The point is what you say: "that might be an insert AND might be an update...we don't know which"
That's the reason why JPA/Hibernate have a persist() and a merge() method. The merge() method allows to persist the state of an unmanaged object in the database.
usegenerator==false ....ID in object set ........instanceof NoSqlProxy -> must be an update ........! instanceof NoSqlProxy -> seems to be an insert - but might be an update.... ....null ID in object set ........instanceof NoSqlProxy -> must fail ........! instanceof NoSqlProxy -> must fail usegenerator==true ....ID in object set ........instanceof NoSqlProxy -> must be an update ........! instanceof NoSqlProxy -> seems to be an insert - but might be an update.... ....null ID in object set ........instanceof NoSqlProxy -> must fail ........! instanceof NoSqlProxy -> generate ID, insert
I think the current implementation is failsafe - that's generally fine :-)
Regarding the two "seems to be"-cases - maybe it should be sufficient do make an object manually "managed"
by passing in the "unmanaged" object and return (the possibly already cached) "managed" object. (managed = instanceof NoSqlProxy). This would require a new method in NoSqlEntityManger like "public
I think in most use cases, people read objects from the entity manager, modify them and write them back using put(). The use case to provide an unmanaged object to execute an update should be rate - so a "manage()" method should be fine.
But that method should take care on relations in the object to get managed.
And the put() methods should persist new relation objects, too (currently it does not, I think).
I am slammed on the business side right now(a good thing) so may not get to this for a bit but it sounds like you could really use it? If so, I will merge the pull request you need immediately.
thanks,
Dean
oh, the cascade option of hibernate has always been a highly debated topic
as it has been very bug prone and hard to make it all work right. Some
models have infinite span so you can't exactly cascade everything and
telling it how to cut the model and where is always an issue especially
when you go to change something. That should be a separate ticket and we
really need to think hard on that. A case and point would be a user has a
List
On Wed, Jul 10, 2013 at 3:41 PM, snazy notifications@github.com wrote:
I think in most use cases, people read objects from the entity manager, modify them and write them back using put(). The use case to provide an unmanaged object to execute an update should be rate - so a "manage()" method should be fine.
But that method should take care on relations in the object to get managed.
And the put() methods should persist new relation objects, too (currently it does not, I think).
— Reply to this email directly or view it on GitHubhttps://github.com/deanhiller/playorm/issues/112#issuecomment-20775539 .
--- Google "Predictive Project Management" and spark a movement ---
When persisting entities, PlayORM executes a bunch of "multiget_slice" requests against Cassandra.
Each request consists of two IP packets to the server - and two IP packets from the server. Means: 4 packets for a single key.
If a lot of rows is being persisted, this results in a big penalty caused by the huge amount of server roundtrips.
Please consider checking for key existance by checking for multiple keys at once - instead of performing existance of keys one-after-another.