dmsnell / diff-match-patch

Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Apache License 2.0
23 stars 5 forks source link

Add NPM package #6

Closed TheSpyder closed 3 months ago

TheSpyder commented 7 months ago

Fixes #3

The basics should make sense but there are some possibly unexpected changes:

I'd like to also add the types from https://github.com/DefinitelyTyped/DefinitelyTyped/tree/master/types/diff-match-patch but they're technically MIT. The types are very simple, and there's probably no legal issue with copying them, but I don't know if you want to deal with potentially mixed licensing 🤔

TheSpyder commented 7 months ago

I wasn't sure what to do with the readme in the NPM package, so I left it out. https://github.com/JackuB/diff-match-patch ships what I think is a copy of the wiki API from google.

dmsnell commented 7 months ago

@TheSpyder I'll be on vacation during the next week, but I'll try and get to this once I'm back. some initial thoughts:

this is impressive I can tell even at a glance. great work.

TheSpyder commented 7 months ago
  • are we sure this export change leaves existing code and dependencies in a good state?

Hard to say. For anyone using the NPM package, it's equivalent, but I did see one person in the issues of that repo asking for a simple change to keep it running in browsers where a module global isn't available. I'll add that.

  • I wonder if we can stage this in multiple pieces. one thing I thought would be nice is to start building the minified and compressed version of the library in Github Actions. it'll be nice to leave things in as they are for existing projects, but if we can remove the need to manually run build steps that would be awesome.

That's definitely a good thought. I wanted to get something published for an internal deadline, but I might link the project to a branch of my fork rather than try to rush this.

  • despite being slower and possibly silly, I'd probably prefer leaving major changes like revamping the tests or removing the compressed/minified version of this out of a PR to start publishing an npm package. I get quickly overwhelmed and find it hard to focus on changes that do many things at once.

Fair point. I'll roll back the branch to remove the test change commit and wait until the NPM package settles to submit it.

I removed the minified version mostly I figured both should be equivalent and I don't have the tools installed to generate it from the uncompressed changes I made. I can bring it back and set the NPM package to publish without it.

this is impressive I can tell even at a glance. great work.

thank you 😁

TheSpyder commented 7 months ago

@dmsnell I've done a few things:

TheSpyder commented 6 months ago

@dmsnell will you have time to look at deploying this to NPM soon? I'm not trying to pressure you, just curious.

dmsnell commented 6 months ago

will you have time to look at deploying this to NPM soon? I'm not trying to pressure you, just curious.

sorry @TheSpyder - I should have anticipated this, but I'm currently held up by some work in the current WordPress release, which is happening right now. please allot me another week and I'll try to prioritize it starting on Monday, if not over the weekend.

TheSpyder commented 6 months ago

Monday is fine - I think by Wednesday we might have to look at an internal deployment. We're currently using a git link to a branch of my repo (from before I rolled back the tests).

TheSpyder commented 6 months ago

The code changes are all coming from https://github.com/JackuB/diff-match-patch which was adjusted to better suit use embedded in a JavaScript application, as NPM developers expect, where the Google code is based on globals and standalone use. I don't have much knowledge about them other than this setup works in our application.

The Diff api change, for example, was actually a rollback - a google patch in 2018 made new Diff tuple objects, but they're non-iterable which breaks ES6 environments. So it was swapped back to a plain array. https://github.com/JackuB/diff-match-patch/issues/14

start by automating the build process of the existing repo

I admit I have no idea what that is 😂 I have only looked seriously at the javascript folder.

dmsnell commented 6 months ago

this all reminds me how important it feels to get a CI job established for building the JS module. if you want to speed things up, you can work on that before I get to it. that is, start by automating the build process of the existing repo, then once that's in place, modify it so that it builds the npm package automatically.

or we could do this in one step, but only when ensuring that nothing breaks backwards compatbility.

on second thought this change of repo is a great opportunity to break from all the constraints from diff-match-patch, like Closure compiler. I forget that nobody has this official repo in their npm projects because it's never published one.

let's see if we can get a CI job setup which produces a minified build, and let's compare that output to what Closure Compiler has been producing. I'm inclined to try something like biomejs which doesn't pull in a bunch of dependencies.

TheSpyder commented 6 months ago

I forget that nobody has this official repo in their npm projects because it's never published one.

That's the key I'm focusing on - anyone using this library on NPM is doing npm install diff-match-patch and perhaps assuming it's the same, as my team did, but it's not. https://www.npmjs.com/package/diff-match-patch

dmsnell commented 4 months ago

@TheSpyder In #9 I eagerly added the Symbol.iterator method so hopefully that makes this PR a bit simpler. You are welcome to incorporate that change by merge or rebase or however you want.

TheSpyder commented 4 months ago

@dmsnell I think I've covered most of the feedback now - it's just the ESM question (whether to make it the default, whether to make it a new PR) that I'm not sure about.

dmsnell commented 4 months ago

@TheSpyder I'll try and get back to this in the next few days after thinking more about ESM. I'm tempted to be a bit aggressive and focus more on the future direction and letting new code accept this. Seems like most projects today can and should be able to handle an ESM package. If it's not clear, we can merge and iterate later. Thanks for your patience with me again. It's been a very busy couple months on my end and I didn't mean to let this go so long.

TheSpyder commented 4 months ago

@dmsnell ok, what I might do in the meantime is update the testing code I pulled out of this PR and make a new PR with it. There's not really a rush on publishing this, although my team has been asking me why we're still deploying production code that depends on my personal github fork of this project 😅

TheSpyder commented 4 months ago

So, bad news. What I haven't mentioned is that my team started using diff-match-patch by forking and updating an abandoned library. This how we ended up with the NPM diff-match-patch dependency.

Unfortunately while integrating this latest change into our code it turns out the library is more or less built on the assumption that the Diff storage is an array. It wasn't just the iterable nature. It unpacks the array, modifies things, makes new arrays and passes them back to dmp functions.

I can't link you to our fork, but the original code is still available. https://github.com/Teamwork/visual-dom-diff/blob/67224d09c4cb21d14cf410c88d11908165a5e7f3/src/util.ts#L230-L283

It's plausible that all this can all still work with a swap to new Diff(), but I'm not entirely confident about that change.

TheSpyder commented 4 months ago

It's plausible that all this can all still work with a swap to new Diff(), but I'm not entirely confident about that change.

It wasn't hard to do and seems to work, so I'm happy to go with the object.

TheSpyder commented 4 months ago

@dmsnell in an earlier comment you mentioned perhaps the ESM change should be a separate PR. I'm happy to do that, if you decide that's the path forward, or extract it to a new PR.

I've created new branches to add types and tests, as separate PRs, once this change is approved and merged. They're working very well on my team's project.

TheSpyder commented 3 months ago

@dmsnell now that we've brought the test changes back in, can we bring the types as well? Or should that still be a separate PR? https://github.com/dmsnell/diff-match-patch/commit/7e93e49d258c354c09f63087e06dde6d2a3428de

TheSpyder commented 3 months ago

One interesting thing I noticed when exploring the tests is that Diff objects behave like arrays in some ways, but are not arrays. There are a lot of test assertions that abuse this and compare tuples with diff objects, e.g. [ DIFF_EQUAL, "" ] === new Diff( DIFF_EQUAL, "" ). If we replace the array shorthand with Diff object creation, then we can replace the recursive equality check function with deepEqual.

It's a result of a change that was made years ago; the diff used to be an array, google changed it to an object. The tests haven't changed since then. For this PR, I want the absolute minimal changes to get it on NPM - perhaps log that as a new issue?

TheSpyder commented 3 months ago

@dmsnell I have a github action that will run the node tests, and it's path constrained so it won't run for other changes. Do you want me to save that for a new PR? 🤔

code: https://github.com/TheSpyder/diff-match-patch/blob/add-ci/.github/workflows/javascript.yml test run: https://github.com/TheSpyder/diff-match-patch/actions/runs/9557535161/job/26344772814

dmsnell commented 3 months ago

Yeah @TheSpyder let's suggest those CI runs in a separate PR. Bravo! Thank you for staying persistent with this.

https://www.npmjs.com/package/@dmsnell/diff-match-patch

Feel free to start using it, and let's continue making it better!

TheSpyder commented 3 months ago

Fantastic! thank you! I forgot to add a readme, might put that in with CI.

It will be easier for my team to use with types, I might make two PRs 🫣