clj-commons / rewrite-clj

Rewrite Clojure code and edn
https://cljdoc.org/d/rewrite-clj
MIT License
591 stars 55 forks source link

rewrite-clj does not preserve CRLF whitespaces #187

Open ikappaki opened 2 years ago

ikappaki commented 2 years ago

Version 1.1.45

Platform Operating System: any Clojure version: any JDK vendor and version: any

Symptom crewrite-clj does not preserve CRLF whitespace when converted from zipper back to string,

Reproduction On the REPL:

(require '[rewrite-clj.zip :as z])
;; => nil
(-> (z/of-string "(ns nl-test\r\n  (:require [rewrite-clj.zip :as z]))")
      z/root-string)
;; => "(ns nl-test\n  (:require [rewrite-clj.zip :as z]))"

The \r\n line endings have been converted to \n.

Actual behavior The zipper converts CRLF line endings to LF.

Expected behavior As per documentation whitespaces should be preserved:

Rewrite-clj is a library that can read, update and write Clojure, ClojureScript and EDN source code while preserving whitespace and comments.

Diagnosis The CRLF to LF conversion was an intentional workaround as per https://github.com/clj-commons/rewrite-clj/commit/f78856edbb9d26a9481eeaf33d071bc3d8ad8868:

Handle Windows line endings

While we await https://clojure.atlassian.net/browse/TRDR-65, I've
introduced a work-around.

See rewrite-clj.reader.cljc comments for details.

Fixes https://github.com/clj-commons/rewrite-clj/issues/93

Action I'm not familiar with the codebase, but I might try to have a look at it at some point.

lread commented 2 years ago

Thanks for raising such a clear and easy-to-digest issue @ikappaki, very much appreciated!

This is currently as designed, but I think rewrite-clj docs should describe this behaviour clearly if this is not already the case.

We use clojure tools.reader and I think it wants to normalize \r\n to \n but does not always handle this well. I raised an issue about this.

ikappaki commented 2 years ago

Thanks for raising such a clear and easy-to-digest issue @ikappaki, very much appreciated!

This is currently as designed, but I think rewrite-clj docs should describe this behaviour clearly if this is not already the case.

We use clojure tools.reader and I think it wants to normalize \r\n to \n but does not always handle this well. I raised an issue about this.

Hi @lread, and thank you for the prompt reply! May I please argue that there shouldn't be any exceptions in the design doc that suggest that all newlines seq should be treated as LF. I think the goal that wtitespaces should be preserved is quite powerful, since it guarantees compatibility across all architecture, which should be part of the design.

This adversely affects window's users. Its native line ending format is CRLF. If a zipper pipeline is constructed on such a file, it is likely to convert the CRLF to LF even if there are no other changes, and the file stored on disk with incompatible line endings. Every rewrite-clj consumer has to implement workarounds for just that case to maintain compatibility on windows. Instead it is much wiser IMHO to maintain whitespaces in this library.

To exaggerate a bit, imagine the trouble it would have caused on *nix if rewrite-clj was converting any newlines to CRLF instead.

IMHO the tools reader issue is an incidental complexity that should be either prioritised upstream (the issue looks stale?) or a workaround should be devised in rewrite-clj instead.

borkdude commented 2 years ago

The JIRA issue doesn't yet have a patch. Maybe it would help if someone wrote that patch and brought the issue to the attention of the core team either on Slack, ask.clojure, etc.

borkdude commented 2 years ago

I haven't looked inside rewrite-clj where this could be patched temporarily, but some looking into this might also help the process of contemplating if a workaround in rewrite-clj is feasible and desired.

lread commented 2 years ago

My understanding: I think the clojure tools.reader does not intend to preserve \r\n line endings. I think it intends to normalize them to \n. This is not Windows-friendly, but a reality.

Before my tools.reader work-around in rewrite-clj, files with mixed \r\n and \n were causing issues for rewrite-clj and raised as an issue by the author of zprint.

At this point not normalizing to \n might be considered a breaking change for some rewrite-clj users and would likely have to be optional.

For now I'm willing to document the behaviour (if I haven't already done so).

lread commented 2 years ago

So @ikappaki, I don't disagree with your sentiment at all here, but Clojure is not entirely Windows-friendly. We, unfortunately, inherit that Windows unfriendly-ness.

lread commented 2 years ago

Related: #93

ikappaki commented 2 years ago

So @ikappaki, I don't disagree with your sentiment at all here, but Clojure is not entirely Windows-friendly. We, unfortunately, inherit that Windows unfriendly-ness.

Sure @lread we are both I think on the same page here and we are just having a friendly discussion from our personal perspectives :)

I have been using Clojure primarily on MS-Windows for a long time now, and I have not noticed anything at the language level that I would consider brought me to a disadvantage compared to working on other operating systems. The tooling perhaps is an exception where the user has to go via PowerShell and calling conventions and arguments passing do not match between unix and windows, but I think this area is not relevant to this case.

I believe there is a large user base who are primarily developing on windows, that should not be discounted as second class citizens, since this hinders Clojure's wider adaptation.

Thanks :)

lread commented 2 years ago

I'm not in disagreement, and very much appreciate this issue and discussion.

My guess: Clojure Windows support is not first-class because there are fewer Clojure Windows users, and there are fewer Clojure Windows users because Windows support is not first-class.

I'll document this rewrite-clj behaviour but leave this issue open for a bit to potentially capture thoughts/ideas from others.

borkdude commented 2 years ago

Just for my understanding: rewrite-clj has a temporary workaround (normalizing \r\n to \n) for an issue in tools.reader, but tools.reader maintainers haven't responded in the issue, but word has it that they will never fix this?

Why does the workaround in rewrite-clj normalize and does not preserve the Windows newline?

lread commented 2 years ago

The current intent of clojure tools.reader is to normalize newlines to \n. Rewrite-clj worked around a bug in that intent.

I was willing to work around a bug, but was not interested in changing the intended behaviour. I felt that changing the intended behaviour would be riskier.

I do not know if the Clojure core team will address the issue I raised, or if they will change the newline normalization behaviour. There was interest on Slack when I raised the issue, but have not heard back since.

lread commented 2 years ago

Added a question to Ask Clojure.

borkdude commented 1 year ago

Note:

user=> (read-string "\"foo\r\nbar\"")
"foo\r\nbar"
user=> (require '[clojure.tools.reader :as rdr])
nil
user=> (rdr/read-string "\"foo\r\nbar\"")
"foo\r\nbar"

I.e. the clojure reader(s) do not mess with newlines within literal strings, but rewrite-clj currently does.

But I don't see why rewrite-clj would not preserve newlines as they are written: it's a rewriting tool with preservation for whitespace after all.

lread commented 1 year ago

Yeah, it took me a while, but I finally agree.

The fact that the Clojure readers normalize newlines is an internal detail of the readers.

One detail that we might want to remember is that:

(read-string "\"foo
bar\"")

Will always (appropriately) return "foo\nbar" regardless of CR or CRLF between foo and bar.

lread commented 1 year ago

I'm starting to think about this one a bit again.

New Newlines

When inserting a new newline, we have to decide the variant of that newline (CR or CRLF).

Ideally we'd just use whatever variant the source is already using but:

  1. the source might have mixed newline variants (here we could just use the first variant seen?)
  2. there might be no newlines in the source
  3. the user might be working on a subset of the source without newlines
  4. the user might be creating source from scratch

It would be nice if the newline variant selection could be automatic for new newlines, but I'm not sure if it can be.

Mixed Newlines

If a source has mixed newlines we'd preserve that oddity.

I remember mixed newlines causing the author of zprint some grief but that might have been due do issues with rewrite-clj and not anything else. I'd keep this in mind.