Closed tsmaeder closed 3 weeks ago
@pisv
@tsmaeder I have noticed that three of the existing tests in diff-computer.spec.ts
now fail. (BTW, it looks like the tests for the scm
package are not executed by yarn test
, probably due to missing test
scripts in scm/package.json
.)
33 passing
3 failing
1) dirty-diff-computer
remove all lines:
AssertionError: expected { changes: [ { …(2) } ] } to deeply equal { changes: [ { …(2) } ] }
+ expected - actual
{
"changes": [
{
"currentRange": {
- "end": 1
+ "end": 0
"start": 0
}
"previousRange": {
"end": 10
2) dirty-diff-computer
add lines to empty file:
AssertionError: expected { changes: [ { …(2) } ] } to deeply equal { changes: [ { …(2) } ] }
+ expected - actual
"end": 3
"start": 0
}
"previousRange": {
- "end": 1
+ "end": 0
"start": 0
}
}
]
3) dirty-diff-computer
multiple additions:
AssertionError: expected { changes: [ { …(2) }, …(2) ] } to deeply equal { changes: [ { …(2) }, …(2) ] }
+ expected - actual
}
{
"currentRange": {
"end": 12
- "start": 11
+ "start": 12
}
"previousRange": {
"end": 10
"start": 9
From the API point of view, I'm a bit concerned about the removal of readonly
modifiers in LineRange
. As described in https://github.com/eclipse-theia/theia/pull/13104#discussion_r1534048338, I'd favour a design where the entire DirtyDiff
structure is effectively read-only for clients.
From the API point of view, I'm a bit concerned about the removal of readonly modifiers in LineRange
It really does not matter, as the diffs are just descriptors into a structure: there is no way we can unsafely manipulate anything using that structure. This is typescript, not Java: typing does not add safety since you can cast anything to anything anyway. Do we want to tell clients that they may not manipulate the returned structure? No, because we don't care about that as we don't keep a reference anyway.
Yes, but DirtyDiff
is used to notify clients about dirty diff update events, i.e. the same instance can be shared between multiple clients, and some of the clients can (and do) keep a reference to it... Not a big deal probably, but my concern is that the implementation seems to drive (and skew) the design in this case. For example, in the former design you could actually freeze the entire DirtyDiff
structure if needed, which would no longer be possible in the new design... Just my 2c.
Also, I'm not sure about changing some of the existing tests to match the new implementation; in general, one would expect the other way round. At least, looking at the code it is not clear to me why the tests needed to be fixed. For example, in the third of the changed tests, the computed dirty diff was expected to contain two additions and one removal (as VS Code also computes in this case), but now it is expected to contain two additions and one modification. Am I missing something?
@pisv the first two tests were simply wrong:
computeDirtyDiff(
sequenceOfN(numberOfLines, () => 'TO-BE-REMOVED'),
[''])
should create a "changed" delta with n lines removed and one (empty) line added. Same idea for the second case. One emtpy line is != no lines.
The last is is, if you ask me, an error in jsdiff: for the test case, it shows as the last two changes an "unchanged" of length one with as the last change a "added" of length one. Correclty, there should be an addition of length two, as we're adding two empty lines at the end. If I replace the emtpy lines with a non-empty string, the algorithm correctly outputs an addition of 2 elements at the end. As far as I can tell, the diff component still shows the correct thing when adding empty lines at the end of a file.
@tsmaeder When I comment out the following lines in the former implementation (current master), the diff-computer.spec.ts
starts to fail on the very same three tests the new implementation failed (https://github.com/eclipse-theia/theia/pull/13787#issuecomment-2156611184):
Notably, it is these lines that made the former implementation look more complex than it actually was, as noted in https://github.com/eclipse-theia/theia/pull/13104#discussion_r1534048338. And I guess these lines were there for a reason.
@pisv actually, the last test is correct, too: the diff from jsdiff, looks like this:
[
{
"count": 5,
"value": [
"first line",
"",
"foo changed on master",
"bar changed on master",
""
]
},
{
"count": 1,
"added": true,
"value": [
"NEW TEXT"
]
},
{
"count": 3,
"value": [
"",
"",
""
]
},
{
"count": 1,
"added": true,
"value": [
"last line"
]
},
{
"count": 1,
"value": [
""
]
},
{
"count": 1,
"removed": true,
"value": [
"last line"
]
},
{
"count": 1,
"added": true,
"value": [
""
]
}
]
This (correctly, IMO) translates into two inserts and a change with the indices mentioned above.
The fact that the tests did not fail before does not mean that the code was correct before: it just means that the tests expected the output of the actual implementation. Can you make an argument as to why the special cases were necessary before? The comments seem to be related to handling adds/removes at the beginning and end, which is not what is happening here: the sequence of edits is simply not what we would expect as humans.
Regarding the last test, here is what VS Code computes in this case:
This also is computed by Theia on the current master.
And here is what the new implementation computes:
As you can see, there is a difference. My main point, however, is that if you are really willing to remove this handling of the special cases in the code, there would be effectively no need to simplify the former implementation any further and no need to change the former design by removing the readonly modifiers in LineRange
...
@pisv what Theia does is correct as far as the underlying diff algorithm is concerned. If VS Code does something different, it's probably because it's not using jsdiff
. "What VS Code" does is simply not an argument in this case. I still maintain the simplified behaviour is correct. The "special cases" are probably a workaround for the behaviour of jsdiff. Unfortunately, we won't ever be able to tell, because there is no reference to why this workaround exists and why we have never tried to get jsdiff to "fix" their implementation. That's why I insist on referencing relevant information (like issues) when the code does something which is unexpected and due to external differences. Anywas...I'm merging this one as a pure version update since we want it in the next IDE release.
@tsmaeder Thank you very much for updating jsdiff. This is really appreciated.
I also applaud you for taking your time with the other changes.
On the current master, after jsdiff update, the same tests still fail after removing the lines 41-57 in diff-computer.ts
, which are responsible for handling the special cases.
If this were really a workaround for some jsdiff behavior, it can then be concluded that this behavior is still present in jsdiff; otherwise, the tests would not fail.
But it could also be supposed that it might have been an attempt to replicate some VS Code behavior in Theia.
Looking at the git history, the special cases in the code were there from the very beginning (https://github.com/eclipse-theia/theia/commit/8ab7d1d19e24e21b10ff19668a3b12aa9906d250#diff-1733db5110f7ae166c6649af4f89c6f2a17c25819ad5f74671c182642449f734).
It would be great if @vinokurig as the author of the original code could comment on the intent, but if that is not possible, we can only guess about the actual reason why these special cases are there.
However, objectively we have some very specific code that is covered with tests that are starting to fail when that code is removed.
At the risk of repeating myself, my main point is that if you really think that it would be safe to remove these special cases from the code, the current implementation would become:
computeDirtyDiff(previous: ContentLinesArrayLike, current: ContentLinesArrayLike): DirtyDiff {
const changes: Change[] = [];
const diffResult = this.computeDiff(previous, current);
let currentRevisionLine = -1;
let previousRevisionLine = -1;
for (let i = 0; i < diffResult.length; i++) {
const change = diffResult[i];
const next = diffResult[i + 1];
if (change.added) {
// case: addition
changes.push({ previousRange: LineRange.createEmptyLineRange(previousRevisionLine + 1), currentRange: toLineRange(change) });
currentRevisionLine += change.count!;
} else if (change.removed && next && next.added) {
// case: modification
changes.push({ previousRange: toLineRange(change), currentRange: toLineRange(next) });
currentRevisionLine += next.count!;
previousRevisionLine += change.count!;
i++; // consume next eagerly
} else if (change.removed && !(next && next.added)) {
// case: removal
changes.push({ previousRange: toLineRange(change), currentRange: LineRange.createEmptyLineRange(currentRevisionLine + 1) });
previousRevisionLine += change.count!;
} else {
// case: unchanged region
currentRevisionLine += change.count!;
previousRevisionLine += change.count!;
}
}
return { changes };
}
which (IMO) is no less simple or correct than the implementation you have proposed, but allows for the entire DirtyDiff
structure to remain read-only for clients from the API point of view.
I shall probably refrain from further comments as I don't think I can really add anything more to this discussion. I will trust the committers doing the right thing. Thank you very much for your attention and patience.
What it does
Fixes #13772
contributed on behalf of STMicroelectronics
How to test
electron.d.ts
as described in the issue and verify the UI is not blocked for a significant time.Follow-ups
I think the performance is currently good enough on relevant scenarios that we don't have to move the diff computation into a worker, but if we do, that would be a followup.
Review checklist
Reminder for reviewers