Closed ov7a closed 6 years ago
If I am correct 0.4.0 is the update that removed Apache Commons.
The way @vishwakarma implemented the current algorithm uses an array for collecting the diffs and then produces the result.
If I am not wrong, the LCS algorithm from Apache uses a kind of cursor called "Snake" that accumulates the result just by transversing the two lists without creating extra objects, just adding a counter.
I will take a look in the 0.3.10 and the current version using a profiler. Do you have some dummy data so I can replicate it?
@ov7a Agreed, it can because of the nature of LCS algorithm ( internally used in case of arrays ) due to its runtime & spacetime complexity. I will work on it with your example as benchmark to improve it. @LouizFC , kindly let me know if you need any assistance here on this, meanwhile i will be also working on it and will get in touch with for the alternates. @ov7a Thank you for bringing this issue.
@vishwakarma I did not have the time for profiling it yet, but I checked the Apache implementation (more importantly, this) and it uses two arrays as "buffers" instead of a matrix (our implementation). Maybe thats the cause? I will investigate it further.
I had just a quick look, so this is more a guess than anything else. I will try to take a serious look at this on the weekend.
The problem is more severe than I though.
But first, a brief disclaimer: I rarely use profilers. In this case I used VirtualVM. So if the results are a bit off I beg your pardon.
OutOfMemoryError
-Xms2048m -Xmx2048m
: OutOfMemoryError
-Xms4096m -Xmx4096m
:
The program doesn't even run without VM args. The heap allocation is also linear, without any spikes, but the heap memory used is almost 20x more than the 0.3.10 version.@vishwakarma What are your thoughts. Should we try to reintroduce the Apache Commons dependency or solve it with our own code?
In my opinion, we should reintroduce Apache Collections for now. I was digging into Apache Collections and the way they produce their EditScript
is really fascinating, we could learn a lot from there and apply the same techniques in the rest of our library in the future.
Edit: Fixed Markdown for images.
Edit2: I tested some more and the 10 MB difference from 0.3.10 and 0.4.4 (with Apache) is due to the better string concatenation and the precompiled Regexes that were introduced
I agree, let's add Apache library for now to unblock the users and fix the reported issue. That will buy us some time to look into Apache implementation and fix with our own code. Please add the apache library back, I will review and test it. Thanks
On Mon, Aug 20, 2018, 9:07 AM Luiz Felipe da Cruz Oliveira < notifications@github.com> wrote:
The problem is more severe than I though.
But first, a brief disclaimer: I rarely use profilers. In this case I used VirtualVM. So if the results are a bit off I beg your pardon. 0.3.10
- No VM args: [image: 0.3.10] https://camo.githubusercontent.com/1e72a411fde36c48f0d75471da0c620ee27d0986/68747470733a2f2f692e696d6775722e636f6d2f717573634634792e706e67 We have very linear heap allocation and little spike present on the creation of the final patch object.
0.4.4
- No Vm args: `OutOfMemoryError´.
- -Xms2048m -Xmx2048m: `OutOfMemoryError´.
- -Xms4096m -Xmx4096m: [image: 0.4.4] https://camo.githubusercontent.com/dc1fdb02e87695887267ef2602923e7b3c8ebd94/68747470733a2f2f692e696d6775722e636f6d2f447a74525a64662e706e67 The program doesn't even run without VM args. The heap allocation is also linear, without any spikes, but the heap memory used is almost 20x more than the 0.3.10 version.
- Replaced current LCS implementantion with 0.3.10 implementation: !(Apache 0.4.4)[https://i.imgur.com/pAlK2jR.png] The memory allocations revert back to 0.3.10.
@vishwakarma https://github.com/vishwakarma What are your thoughts. Should we try to reintroduce the Apache Commons dependency or solve it with our own code?
In my opinion, we should reintroduce Apache Collections for now. I was digging into Apache Collections and the way they produce their EditScript is really fascinating, we could learn a lot from there and apply the same techniques in the rest of our library in the future.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/flipkart-incubator/zjsonpatch/issues/77#issuecomment-414188768, or mute the thread https://github.com/notifications/unsubscribe-auth/AB-YqbXl34RjjuWbDGUDTgJBsSxPfZaJks5uSi7ogaJpZM4V95xU .
Fixed by PR #78
It seems that In newer versions (0.4.0+) JsonDiff consumes significantly more memory. Most probably this is related to #60 and getLCS method.
For version
0.4.4
:I'm comparing two quite large jsons, around 700 Kb each, with around 300Mb of memory. It is caused by comparing two arrays with around 40 000 of strings each.
Version 0.3.10 works just fine.