atos1990 / orika

Automatically exported from code.google.com/p/orika
0 stars 0 forks source link

Class-cast exception for mapped objects #46

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
We run a very complex mapped object tree, with several levels of nested objects 
in lists, where each list can contain large amounts (thousands) of objects. 
Dozer handles this without a problem, but of course - very slowly. 
Orika's overhead is negligable to non-existant (we map objects that are about 
2-4 mb of binary data usually under 1 ms). so performance wise it's very good.
anyhow - it seems that from time to time an object from a different path in the 
object tree get's mapped to the wrong list:
i.e.
myObject.myList<FirstType>.nestedList<NestedType>
would get objects from
myObject.otherList<SecondType>

the really strange thing is that nested list would contain mostly NestedType an 
one instance of SecondType

reproduction is consistent when a large enough object tree (say 2-3 levels of 
nesting, thousands of objects in the lists) and enough executions of the mapper.
out of 800 mappings of the root object i consistently get 3-4 class cast 
exceptions while iterating on those lists.

This seems to happen only after the code has been compiled into byte code - 
that's my diagnosis based on the fact that it happens very intermittently, and 
far more frequently on "run" configuration rather than on "debug".
i would say that it was a race-condition but i am running on a single thread so 
 i don't see how that would be possible. but that might be all-together 
unrelated.

What version of the product are you using? On what operating system?
we're using the latest 1.2.0 version, but this goes back as at lest 1.1.5 which 
was the first version we tried. i check every release to see if it was resolved.

Original issue reported on code.google.com by yos...@gmail.com on 8 Sep 2012 at 10:26

GoogleCodeExporter commented 9 years ago
The 1.2.0 release provide an improved debug log details, can you please send us 
your logs, so we investigate on this strange behaviour.

I think this can relate to the hashCode, mapped objects , in a mapping 
operation, orika track all mapped objects in cahce the key is based on their 
identityHashCode combined with identityHashCode of the destination type.

Any way if you have a test case that fails, or debug(1.2.0) logs join it to 
this issue I can give an insider look.

Original comment by elaat...@gmail.com on 8 Sep 2012 at 2:00

GoogleCodeExporter commented 9 years ago
maybe there was a collision ! even if this could be very strange. I dont this 
could be a concurrent issue, because each call of mapper facade has it's own 
context(cache)

Original comment by elaat...@gmail.com on 8 Sep 2012 at 2:02

GoogleCodeExporter commented 9 years ago
I will try to reproduce in a test case.

Original comment by yos...@gmail.com on 8 Sep 2012 at 5:33

GoogleCodeExporter commented 9 years ago
Mm, hashing is very likely to be the issue. It would be consistent
with intermittent failures, and i would also expect very repeatable
failures for the same mapping (as hashes would be the same). Its also
consistent with failing for larger object trees.

Original comment by yos...@gmail.com on 8 Sep 2012 at 5:39

GoogleCodeExporter commented 9 years ago
When you say you've been using the latest 1.2.0 version, do you mean that 
you've seen the issue crop up again since the release? The reason I ask is that 
there was an issue (45) fixed just a few days before the release that could be 
related to this (reusing the results of a previously resolved type for another 
type where it doesn't apply).

Original comment by matt.deb...@gmail.com on 9 Sep 2012 at 3:40

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Hi,

@Matt - no, the issue was there in every release since 1.1.5

Ok - i think i figured out the the problem (not the fix tho :)  ).
The problem seems to happen only when there are multiple uses of the same 
(non-primitive) class in different depths in the object tree.
it also requires a high object count to reproduce.

the attached test reproduces the crash very consistently.
output on my machine:

Failed to process graph. failed after 27 transforms, exception = 
java.lang.ClassCastException: orika.TestOrika$Two cannot be cast to 
orika.TestOrika$One
java.lang.ClassCastException: orika.TestOrika$Two cannot be cast to 
orika.TestOrika$One
    at orika.TestOrika.test(TestOrika.java:124)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:76)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:195)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:63)

Thanks.

Original comment by yos...@gmail.com on 10 Sep 2012 at 10:18

Attachments:

GoogleCodeExporter commented 9 years ago
Unfortunately I can't reproduce the failling test, the test passes without any 
modification. 

@Matt can you please try to confirm

Original comment by elaat...@gmail.com on 10 Sep 2012 at 1:41

GoogleCodeExporter commented 9 years ago
The test code catches the exception (i left it so i can print it)
If you remove the try/catch it should fail

Original comment by yos...@gmail.com on 10 Sep 2012 at 1:56

GoogleCodeExporter commented 9 years ago
Even without try/catch it does not fail, (it passes)

Original comment by elaat...@gmail.com on 10 Sep 2012 at 2:26

GoogleCodeExporter commented 9 years ago
very strange, but it may simply be that it takes more iterations before it
fails on some machines.
for me it consistently fails after 30-50 parent instances, but it's
entirely possible that this system/memory/cpu/jvm version dependent.
on our production data (which is much MUCH larger) it fails in maybe 1 in
1000 or 1 in 10000 translations.

Original comment by yos...@gmail.com on 10 Sep 2012 at 5:37

GoogleCodeExporter commented 9 years ago
I will check this on Mac 64bit with more memory so I can try large iterations.

Original comment by elaat...@gmail.com on 10 Sep 2012 at 5:51

GoogleCodeExporter commented 9 years ago
Ok I was able to reproduce the bug,

Original comment by elaat...@gmail.com on 10 Sep 2012 at 5:56

GoogleCodeExporter commented 9 years ago
I have a fix ready to push...it's a MappingContext problem (hash key collision)

Original comment by matt.deb...@gmail.com on 10 Sep 2012 at 7:35

GoogleCodeExporter commented 9 years ago
:)
Thanks!
I was getting to a place where i was starting to consider doing the
mapping manually (dozer is insanely slow so not an option)

Original comment by yos...@gmail.com on 10 Sep 2012 at 7:40

GoogleCodeExporter commented 9 years ago
Ok, a fix has been pushed to the current trunk. 
Please let us know how this works for you.

Original comment by matt.deb...@gmail.com on 10 Sep 2012 at 7:53

GoogleCodeExporter commented 9 years ago
I have one I was waiting for more tests :(, good job Matt, 
Thank you Yossin for the bug report :)

Original comment by elaat...@gmail.com on 10 Sep 2012 at 8:06

GoogleCodeExporter commented 9 years ago
is there a maven snapshot build i can use?

Original comment by yos...@gmail.com on 11 Sep 2012 at 9:04

GoogleCodeExporter commented 9 years ago
here you can find snapshot repository :

https://repository-orika.forge.cloudbees.com/snapshot/

version : 1.2.1-SNAPSHOT, the build passes.

Original comment by elaat...@gmail.com on 11 Sep 2012 at 9:15

GoogleCodeExporter commented 9 years ago

Original comment by elaat...@gmail.com on 11 Sep 2012 at 8:26

GoogleCodeExporter commented 9 years ago
Hi,
Thanks for the repo/snapshot link.
unfortunately the problem still exists in the current repo snapshot build.
(didn't run my test, i am running with my more complex production data
tree)

i've also noticed that sometimes i get an exception, trace below:
ma.glasnost.orika.MappingException: Error encountered while mapping for the
following inputs:
rawSource=id: 5445970528886749920

sourceClass=class com.Data$ModuleInfo
sourceType=ModuleInfo
destinationType=ModuleInfo
at ma.glasnost.orika.impl.MapperFacadeImpl.map(MapperFacadeImpl.java:251)
 at
ma.glasnost.orika.impl.MapperFacadeImpl.mapAsCollection(MapperFacadeImpl.java:57
0)
at
ma.glasnost.orika.impl.MapperFacadeImpl.mapAsList(MapperFacadeImpl.java:361)
 at
ma.glasnost.orika.generated.Orika_Sample_Sample_Mapper1446711308.mapAtoB(Orika_S
ample_Sample_Mapper1446711308.java)
at
ma.glasnost.orika.impl.mapping.strategy.UseCustomMapperStrategy$ForwardMapperRef
erence.map(UseCustomMapperStrategy.java:70)
 at
ma.glasnost.orika.impl.mapping.strategy.UseCustomMapperStrategy.map(UseCustomMap
perStrategy.java:48)
at ma.glasnost.orika.impl.MapperFacadeImpl.map(MapperFacadeImpl.java:157)
 at
ma.glasnost.orika.impl.MapperFacadeImpl.mapAsCollection(MapperFacadeImpl.java:57
0)
at
ma.glasnost.orika.impl.MapperFacadeImpl.mapAsList(MapperFacadeImpl.java:361)
 at
ma.glasnost.orika.generated.Orika_ClientData_ClientData_Mapper391850200.mapAtoB(
Orika_ClientData_ClientData_Mapper391850200.java)
at
ma.glasnost.orika.impl.mapping.strategy.UseCustomMapperStrategy$ForwardMapperRef
erence.map(UseCustomMapperStrategy.java:70)
 at
ma.glasnost.orika.impl.mapping.strategy.UseCustomMapperStrategy.map(UseCustomMap
perStrategy.java:48)
at ma.glasnost.orika.impl.MapperFacadeImpl.map(MapperFacadeImpl.java:157)
 at ma.glasnost.orika.impl.MapperFacadeImpl.map(MapperFacadeImpl.java:125)
at ma.glasnost.orika.impl.MapperFacadeImpl.map(MapperFacadeImpl.java:595)
 at com.transform.OrikaMapperImpl.map(OrikaMapperImpl.java:79)
at
com.analytics.flowing.core.FlowingGraphManagerTest$SampleExtractorProcessor.proc
ess(FlowingGraphManagerTest.java:201)
 at
com.analytics.flowing.core.FlowingGraphManagerTest$SampleExtractorProcessor.proc
ess(FlowingGraphManagerTest.java:178)
at
com.analytics.flowing.core.FlowingGraphManagerTest$QueueProcessor.run(FlowingGra
phManagerTest.java:157)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:8
86)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassCastException: com.transport.client.LiveData
cannot be cast to com.transport.client.FileData
 at
ma.glasnost.orika.generated.Orika_ModuleInfo_ModuleInfo_Mapper1490854893.mapAtoB
(Orika_ModuleInfo_ModuleInfo_Mapper1490854893.java)
at
ma.glasnost.orika.impl.mapping.strategy.UseCustomMapperStrategy$ForwardMapperRef
erence.map(UseCustomMapperStrategy.java:70)
 at
ma.glasnost.orika.impl.mapping.strategy.UseCustomMapperStrategy.map(UseCustomMap
perStrategy.java:48)
at ma.glasnost.orika.impl.MapperFacadeImpl.map(MapperFacadeImpl.java:157)
 ... 21 more

Original comment by yos...@gmail.com on 11 Sep 2012 at 10:36

GoogleCodeExporter commented 9 years ago
Is it possible to enable writing of source files
and to send us the java sources which Orika generates for the mapper?
(just set system property OrikaSystemProperties.WRITE_SOURCE_FILES=true,
 then the files should be generated in bin directory (target/test-classes if you're running as a unit test in a maven project)

Original comment by matt.deb...@gmail.com on 12 Sep 2012 at 2:02

GoogleCodeExporter commented 9 years ago
Based on the line number of MapperFacadeImpl:570, it seems that it's expecting 
to iterate through a source Iterable of 'com.transport.client.FileData', but it 
found an instance of 'com.transport.client.LiveData' instead.
Does this sound right based on your object graph? (are FileData and LiveData 
part of the source graph?)

Original comment by matt.deb...@gmail.com on 12 Sep 2012 at 2:20

GoogleCodeExporter commented 9 years ago
Both objects are part of the graph, yes, But the source objects are
strongly typed and generated through a protobuf transform which is all
straight forward generated code.
Dozer performs the mapping without a problem so i don't believe the
problem is at the source.
Furthermore, the hash related issue is still there (don't know if the
exception trace from before is related or not)

The generated java files seem a bit complex, Have you considered
having the code generator create straight forward mapping code (as you
would have if you were writing it manually)?

Original comment by yos...@gmail.com on 12 Sep 2012 at 7:19

GoogleCodeExporter commented 9 years ago
I wasn't trying to suggest that the problem was outside Orika -- if Dozer maps 
it fine, I assume it's our problem; I was just trying to zero in on what the 
problem could be...
But I just checked the snapshot repository that we provided you and decompiled 
the latests snapshot that was there, and it did not pick up the fix for the 
hashing issue, so apparently you haven't been able to test out our fix yet. 
I'll post back once I can confirm that the update is available from the 
snapshot repository -- sorry for the mixup.

Original comment by matt.deb...@gmail.com on 12 Sep 2012 at 2:58

GoogleCodeExporter commented 9 years ago
Ok, I confirmed that the latest snapshot has been updated in the snapshot 
repository. Would you be willing to try it again?

Original comment by matt.deb...@gmail.com on 12 Sep 2012 at 3:11

GoogleCodeExporter commented 9 years ago
:)
10x!

Original comment by yos...@gmail.com on 12 Sep 2012 at 3:12

GoogleCodeExporter commented 9 years ago
Sure :)

Original comment by yos...@gmail.com on 12 Sep 2012 at 3:12

GoogleCodeExporter commented 9 years ago
Ok,
So the crash is gone ( :) )
But.. Now the processor seems to take a very long time (20 seconds) on
samples that are marginally larger: 800k vs 600k are taking a LOT
longer : 20 seconds vs a few milli.
I halted the debugger a few time and it seems to be spending a long
time inside the hashmap code.

Original comment by yos...@gmail.com on 12 Sep 2012 at 3:32

GoogleCodeExporter commented 9 years ago
Some more info:
Profiler shows most of the time on cacheMappedObject and getMappedObject
Its about 90% of my overall test time
Both methods take roughly the same amount of time.(get takes marginally more)

Original comment by yos...@gmail.com on 12 Sep 2012 at 3:36

GoogleCodeExporter commented 9 years ago
Ok, I think we have a solution for that, now that we know where to focus...

Original comment by matt.deb...@gmail.com on 12 Sep 2012 at 3:40

GoogleCodeExporter commented 9 years ago
Ive looked into it, the problem is the hash/equals code in CacheKey.
I replaced it with lombok's hashing function - now the larger objects
take marginally longer then small ones.

Original comment by yos...@gmail.com on 12 Sep 2012 at 3:56

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Hi,
These objects, while immutable, are usually touched just once (i tried
caching, made no difference)
It seems my "fix" didnt work - what does work is using nested hashmaps
(i used a type-then-object structure, tho it may be faster the other
way around.
Anyway - it's much faster.

Original comment by yos...@gmail.com on 12 Sep 2012 at 4:28

GoogleCodeExporter commented 9 years ago
can we somehow determine at mapper generation time whether recursion is
even possible?

Original comment by matt.deb...@gmail.com on 12 Sep 2012 at 4:47

GoogleCodeExporter commented 9 years ago
Hi,

There's no recursion i can see, the problem is simply that the hashcode 
generation takes too long.
i changed the implementation with one that uses nested hashmaps and the 
performance issue disappeared completely.

attached is a fixed file for this.

Original comment by yos...@gmail.com on 12 Sep 2012 at 5:06

Attachments:

GoogleCodeExporter commented 9 years ago
How long this take (in comparison with previous solution) ?

Original comment by elaat...@gmail.com on 12 Sep 2012 at 6:06

GoogleCodeExporter commented 9 years ago
Miliseconds compared to 20 seconds for trees with ~100k objects

Original comment by yos...@gmail.com on 12 Sep 2012 at 6:08

GoogleCodeExporter commented 9 years ago
It's not that there's recursion going on, it's that the only reason this 
context exists is to prevent recursion.
So, if we can determine for sure that recursion is not possible for a given 
mapping situation (like yours), we can skip it altogether.

Original comment by matt.deb...@gmail.com on 12 Sep 2012 at 6:19

GoogleCodeExporter commented 9 years ago
I'd be interested to know how long it takes if you make 'cacheMappedObject' and 
'getMappedObject' into NO-OPs?

Original comment by matt.deb...@gmail.com on 12 Sep 2012 at 6:22

GoogleCodeExporter commented 9 years ago
how?
can i just return null?
aren't they essential for the process?

Original comment by yos...@gmail.com on 12 Sep 2012 at 6:33

GoogleCodeExporter commented 9 years ago
yeah, to test the NO-OP version, just make them return null.
they should only be essential for preventing recursion....but if we know
there's no recursion in a particular case, we shouldn't need it

Original comment by matt.deb...@gmail.com on 12 Sep 2012 at 6:42

GoogleCodeExporter commented 9 years ago
well,
it takes no time at all.
it's an average of 20464 nanoseconds per sample  without the mapper
context(0.02 ms)
and with the cache it takes 687348 nano on average per sample  (0.687348 ms)
so either way it's fast, but without the protection its an order of
magnitude faster.

i would suggest adding some way to qualify reference lookup, while the
faster mechanism with the nested maps is pretty good, you wouldn't really
want to pay for it unless you know your tree is cyclic.
the overhead can be significant in terms of memory as well as cpu time for
larger trees.

anyhow - i am very happy with the 0.02 ms time for the object tree, dozer
does it at between 2 to 4 seconds (so orika is about x100,000 times faster
without validation and about 10,000 with the optimized map version).

Original comment by yos...@gmail.com on 12 Sep 2012 at 7:22

GoogleCodeExporter commented 9 years ago
One simple way to disable this (as Sidi mentioned a little bit earlier) is
to sub-class MappingContext with your NO-OP version.
Then, in your initial call to mapperFacade.map(...), use the signature
where you pass in the mapping context, and pass an instance of your NO-OP
context.

Original comment by matt.deb...@gmail.com on 12 Sep 2012 at 7:38

GoogleCodeExporter commented 9 years ago
Yeh, that'll work :)

I'd recommend adding my mods to the default mapper as they significantly
improve performance.

Original comment by yos...@gmail.com on 12 Sep 2012 at 8:22

GoogleCodeExporter commented 9 years ago
Many thanks Yossin :)
We will take this in consideration, 
Anyway we will try to do a release by end of the weeek

Original comment by elaat...@gmail.com on 12 Sep 2012 at 9:07

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Yossin,
would you mind taking another test with latest from the snapshot repo?
this is our intended fix; it'd be great to have another perf result from your 
use-case against it for the default case (not using the No-Op version)
thanks again for your help in resolving this!

Original comment by matt.deb...@gmail.com on 13 Sep 2012 at 10:48

GoogleCodeExporter commented 9 years ago
No prob, but I'm afraid it will have to wait for sunday (weekend ;) )

Original comment by yos...@gmail.com on 13 Sep 2012 at 10:52

GoogleCodeExporter commented 9 years ago
Fixed release 1.2.1

Original comment by elaat...@gmail.com on 15 Sep 2012 at 12:33