JBlond / php-diff

A comprehensive library for generating differences between two strings in multiple formats (unified, side by side HTML etc). Based on the difflib implementation in Python
BSD 3-Clause "New" or "Revised" License
36 stars 4 forks source link

Suitability inquiry: patch generation? #120

Open rulatir opened 1 year ago

rulatir commented 1 year ago

Given file1, file2, file3... on disk and $modifiedContents1, $modifiedContents2, $modifiedContents3... strings (existing ONLY in memory and without ANY file representation), how difficult would it be to use this library to render a standard patch file that would apply the respective modifications to file1, file2, file3... when fed to the standard patch utility?

rulatir commented 1 year ago

I tried to concatenate the output of multiple diffs with appropriate headers prepended, but I am facing a difficult issue. The patch format generation seems to require special handling if and only if the following conditions simultaneously hold:

  1. the last line of the file ends up emitted as part of unmodified context of the last change, and
  2. there is no newline at the end of file

This condition is non-trivial to detect, mainly because of the subcondition 1. Some support from the library would be welcome.

JBlond commented 1 year ago

I wonder since the text Unified output is the same as diff -u a.txt b.txt. That output can be used to "feed" patch.

JBlond commented 1 year ago

@rulatir It would be nice to have some example content to look at.

jfcherng commented 1 year ago

I wonder since the text Unified output is the same as diff -u a.txt b.txt. That output can be used to "feed" patch.

Not exactly. You would have to add support for "No newline at end of file" situations, which I did in my lib.

If there is "No newline at end of file" in old.txt, then

--- old.txt     2022-12-04 00:29:52.529176300 +0800
+++ new.txt     2022-12-04 00:29:53.897753600 +0800
@@ -1 +1 @@
-1st
\ No newline at end of file
+1st

If there is "No newline at end of file" in new.txt, then

--- old.txt     2022-12-04 00:28:20.045745500 +0800
+++ new.txt     2022-12-04 00:28:19.206919100 +0800
@@ -1 +1 @@
-1st
+1st
\ No newline at end of file

If both are "No newline at end of file", then

--- old.txt     2022-12-04 00:29:52.529176300 +0800
+++ new.txt     2022-12-04 00:30:52.437851700 +0800
@@ -1 +1 @@
-1st
\ No newline at end of file
+1st added
\ No newline at end of file
JBlond commented 1 year ago

I added the function for that into https://github.com/JBlond/php-diff/tree/php-diff-120 It still must be added to the renderers. PRs are welcomed.

DigiLive commented 1 year ago

@JBlond After a busy period, I have some spare time now to help you out, but I need a heads-up. Please send me an email with whats supposed to happen.

JBlond commented 1 year ago

@DigiLive I did, and I hope that it is clear now. If not, ask rulatir for more details.

DigiLive commented 1 year ago

Nope... it's still not clear to me what's expected to happen. Maybe @rulatir or @jfcherng can elaborate. I need more details than already given.

jfcherng commented 1 year ago

The issue is, this lib doesn't provide the correct Unified output when one of the file has no EOL at EOF.


If https://github.com/JBlond/php-diff/issues/120#issuecomment-1336193044 is not clear enough, then I think you have to diff them with diff -u by yourself.

Here's a sample.zip, just run diff -u old.txt new.txt and see the output (compare with this lib's).


Since I am not really interested in this thread and I think the elaboration is pretty clear. I am leaving.

DigiLive commented 1 year ago

I'm currently looking into this issue. However, my knowledge is about PHP and not diffutils. ;)

It seems like GNU diffutils:

  1. Appends line \ No newline at end of file to the output of a file's last line, if this line doesn't contain EOL characters.
  2. Drops the last line for comparison if it doesn't contain any characters, including EOL.

@JBlond can you confirm my conclusions?

JBlond commented 1 year ago

I'm a bit puzzled by the definition. https://github.com/Distrotech/diffutils/blob/9e70e1ce7aaeff0f9c428d1abc9821589ea054f1/doc/diffutils.texi#L1718-1748

JBlond commented 5 months ago

@DigiLive After reading a gazillion of implementations I can confirm that you are right with your conclusions..