cygri / htmldiff

A command-line script that shows text changes between two HTML files
MIT License
61 stars 27 forks source link

Added side-by-side diff and other enhancements #3

Closed induane closed 7 years ago

induane commented 10 years ago

New Features:

cygri commented 10 years ago

That's some excellent work there!

Would you be able to add a meaningful README.md, with enough information to allow someone who doesn't know a thing about Python, setuptools or distutils to run the thing (or at least pointers that tell them where to look)?

induane commented 10 years ago

Hopefully that updated readme file helps a bit!

cygri commented 10 years ago

That is great!

One more concern. The executable command is called htmldiff in the original version, but diff_html in your fork. Is there a particular reason for this? The change may cause disruption for users of the current version who have it integrated in scripts, version control tools, and so on.

induane commented 10 years ago

I ended up doing this because htmldiff was pretty generic. There was at least one other python project that used it as an entrypoint, plus a ruby gem I had, and weirdly an old nodejs tool. Sometimes I think naming things is one of the hardest things one does in programming :/

On Sat, Mar 29, 2014 at 8:14 AM, Richard Cyganiak notifications@github.comwrote:

That is great!

One more concern. The executable command is called htmldiff in the original version, but diff_html in your fork. Is there a particular reason for this? The change may cause disruption for users of the current version who have it integrated in scripts, version control tools, and so on.

Reply to this email directly or view it on GitHubhttps://github.com/cygri/htmldiff/pull/3#issuecomment-38995239 .

Brant Watson PCdisposal.com brant@pcdisposal.com (913) 742-2458

cygri commented 10 years ago

Yeah, htmldiff is certainly an overloaded name. But it's such a good one—it's 100% descriptive.

I'm failing to come up with good alternatives that aren't somehow taken already.. htdiff and diffhtml already exist. How about htmlcompare, htmlcmp, cmphtml?

What would be the practical consequences of sticking with htmldiff?

cygri commented 10 years ago

I asked Twitter for other name ideas, here's what came up, FWIW:

I'm still tempted to just stick with htmldiff, but maybe I don't know the problems that this would cause.

Update: More proposals:

induane commented 7 years ago

I'd almost completely lost track of this thread, I actually kind of like htmldelta but I'm fine with htmldiff in the end :)

cygri commented 7 years ago

Ah, thanks for picking this up again! Looks like the README still refers to diff_html in a few places, should that be changed to htmldiff?

induane commented 7 years ago

Yep I probably just missed that. I'll update it when I get in to the office today. Cheers!

On Fri, Jan 13, 2017, 12:48 AM Richard Cyganiak notifications@github.com wrote:

Ah, thanks for picking this up again! Looks like the README still refers to diff_html in a few places, should that be changed to htmldiff?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cygri/htmldiff/pull/3#issuecomment-272373682, or mute the thread https://github.com/notifications/unsubscribe-auth/ACNmnbvIAp9tm8dKZnMudDOtCi3mwFBzks5rRx5LgaJpZM4BrgHi .

induane commented 7 years ago

I updated it to use Argparse while I'm at it since Optparse is pretty well deprecated at this point. I don't think I missed anything, but let me know if I did.

induane commented 7 years ago

Commit summary since the multi-year haitus:

Update entrypoint name

Split tooling into lib, cleanups

Write either to stdout or to a file

cygri commented 7 years ago

Fantastic work!

The ball is now in my court to do a bit of testing and merge the PR. That's now on my to-do list. Feel free to remind me if you feel it takes too long.

Or are there any lurkers willing to test-drive @induane's changes and give a +1?

cygri commented 7 years ago

So I tested this on the content of these two URLs: https://www.w3.org/TR/2016/WD-shacl-20160814/ https://www.w3.org/TR/2017/WD-shacl-20170202/

These are two versions of the same document, with lots of changes, but also significant parts that remained the same.

The current htmldiff does a very poor job at handling markup and the like, but is doing reasonably well with the content: some sections are red, some are green, some are white.

The updated htmldiff from this PR does a much better job at keeping the markup intact, but it doesn't handle the content properly. Almost all of the document appears as green with strike-through, which I guess indicates that almost all content is seen as both added and deleted. The -a flag doesn't seem to make much of a difference to the result, and -s didn't work at all here but that might be due to the fancy markup.

Am I wrong to expect a result closer to the old behaviour here?

induane commented 7 years ago

@cygri not to my knowledge you're not wrong. I don't remember changing anything in terms of functionality in that way as it was more of an extension.

Of course the results speak for themselves so - it's incumbent upon me to see wtf I did wrong :)

cygri commented 7 years ago

This is giving me good results on some test documents.

Would you mind doing one last change on this PR? Edit the beginning of README.md to highlight your own significant contributions to this version?

Then I'm ready to hit the big green button.

cygri commented 7 years ago

After three and a half years! 🎉

Thanks for the great work on this, induane.