Open GoogleCodeExporter opened 9 years ago
I agree completely. Furthermore the Qt library results in a significant
slowdown of
the code. Using real arrays would be so much faster than QLists of QStrings.
However, the person who translated DMP into C++ was using Qt in his project, so
that's what he used. Using Qt also made the translation easier, since Qt
closely
mimics the Java data structures. I've got a long-term goal of removing the Qt
dependency, but there are a lot of higher priority items before I get to that.
If you or someone else wants to take a shot at removing Qt, I'd be very
grateful.
Just removing Qt from diff_map() would approximately double the speed of
differencing. I've already taken care of match_bitap() which was the other
pain-
point.
Minor note: In the Java, C++ and C# versions a diff is represented as a linked
list
of diffs. Whereas in the Python and JavaScript versions a diff is represented
by an
array of diffs. This leads to slightly different algorithms when traversing a
diff
(such as in the cleanup functions). If Qt were removed from the C++ version, I
suspect that switching from linked lists to arrays would be more efficient.
Original comment by neil.fra...@gmail.com
on 9 Sep 2009 at 6:50
Please find attached my version that uses only standard C++ library.
In addition to the removing the Qt dependency, several other modifications were
done:
1) diff_linesToChars/diff_charsToLines use an array of the pointers to the
substrings in the source strings instead of an array of the substrings.
2) I noticed that compilers (at least Microsoft's one) generate much better
code for the functions returning containers by value if the functions do not
contain multiple return statements. This was the reason for a dummy loop in
diff_compute.
3) static modifiers were added to the member functions that did not use
diff_match_patch values or other non-static members. A check for the unlimited
time was moved from diff_halfMatch to diff_compute that receives the deadline
argument. Because of that, a check for optimal no halfmatch does not work and
was turned off.
4) A test case for the diff_bisect timeout was quick enough to complete before
the clocks were able to move forward. I added a loop waiting for the first tick
before this test.
The code was tested with MSVC++ 2008 and GNU G++ 4.4.5.
Original comment by snhere@gmail.com
on 16 Jan 2011 at 9:40
Wow, thank you! I'll start work on submitting this code next week.
Original comment by neil.fra...@gmail.com
on 19 Jan 2011 at 12:39
Here is an update. After a little thought, I converted the whole
diff_match_patch class to a template, with all character type dependencies
moved to a separate traits class. This allows to use any string types that
provide std::basic_string interface: either derived from standard string types,
or custom. For example, I tried speedtest with the strings of 8-bit and 32-bit
chars, both without any problems (not checked scrupulously, but at least both
returned the same number of diffs as 16-bit version).
Another small modification is that output diff lists are passed by reference to
the private functions to avoid dependency on compiler optimization. MSVC that I
mostly use in my projects seems to get lost within complex functions and does
some crazy copy-construction work if the results are returned by value, which
is rather costly with the STL containers.
All recent changes have been implemented, including a fix to the issue #40.
Please take a look.
Original comment by snhere@gmail.com
on 23 Jan 2011 at 7:12
Attachments:
Hi snhere,
Nice work on the stl port. With your update, I think you accidentally missed
diff_match_patch.cpp.
Original comment by dersai...@gmail.com
on 3 Jun 2011 at 2:23
Any update on the STL port of this library?
Original comment by vkr...@gmail.com
on 20 Jun 2011 at 10:23
The update didn't miss diff_match_patch.cpp---it turned dif_match_patch into a
header-only library. Everything is in diff_match_patch.h
Original comment by charles....@gmail.com
on 22 Jun 2011 at 7:31
I was trying to convert the code to a Java generic list comparison. There are
a few areas left (13 errors). Some are the same. I hope you would take a
look. Since you're familiar with the code, maybe it'll take 5 to 10 minutes,
or at least provide some comments on how to deal with some of them.
Original comment by null....@gmail.com
on 6 Aug 2011 at 12:40
Question: is it possible to use string instead of wstring?
because every time I try "diff_match_patch<string> dmp;"
it ends up giving me errors on the type of string I'm using
thanks
Original comment by kamhim....@gmail.com
on 29 Nov 2011 at 8:19
Any updates on this?
Original comment by mike.naq...@gmail.com
on 3 Aug 2012 at 7:35
[deleted comment]
Thanks so much for your work Sergey.
I wonder why this code is not presented as the official C++ version.
#10, updates what? The awesome header-only implementation isn't working for
you?
Original comment by stevenlu...@gmail.com
on 26 Jun 2013 at 10:04
I am wondering why you went with wchar_t as opposed to just doing UTF-8 with
std::string throughout.
Original comment by stevenlu...@gmail.com
on 3 Jul 2013 at 2:12
I have put the good work from Sergey Nozhenko on GitHub and added some tweaks
to support std::string in addition to std::wstring. There are now test
harnesses for both types of strings.
Here is the link to the repository:
https://github.com/leutloff/diff-match-patch-cpp-stl
Original comment by christia...@gmail.com
on 18 Jul 2013 at 8:49
I've ported the C++ fixes from r90, r96, r97, r98 and r100 to the standard C++
version: https://github.com/leutloff/diff-match-patch-cpp-stl/pull/4
This mostly fixes known bugs in diff_cleanupSemantic.
Original comment by bgrain...@gmail.com
on 23 May 2015 at 2:25
Original issue reported on code.google.com by
bradleelandis
on 8 Sep 2009 at 8:42