Glavin001 / atom-beautify

:mega: Help Wanted - Looking for Maintainer: https://github.com/Glavin001/atom-beautify/issues/2572 | :lipstick: Universal beautification package for Atom editor (:warning: Currently migrating to https://github.com/Unibeautify/ and have very limited bandwidth for Atom-Beautify Issues. Thank you for your patience and understanding :heart: )
http://unibeautify.com/
MIT License
1.5k stars 453 forks source link

breaking bug: formatting HTML corrupts source, moves content around #1802

Closed garretwilson closed 7 years ago

garretwilson commented 7 years ago

I'm using Atom 1.19.0 ia32 with atom-beautify 0.30.4 on Windows 10 Pro 64-bit. I'm using the following .jsbeautifyrc:

{
    "indent_with_tabs": true,
    "preserve_newlines": true,
    "max_preserve_newlines": 2,
    "end_with_newline": true,
    "wrap_line_length": 0
}

I have the following content. (Copyright © 2016–2017 GlobalMentor, Inc. Excerpted for bug investigation.)

<!DOCTYPE html><html lang="en-US">
  <head>
    <meta charset="UTF-8" />
    <title>Time</title>
    <meta name="author" content="Garret Wilson" />
  </head>
  <body>
    <aside>This document is an excerpt from the <cite>GlobalMentor Complete Course on Software Development</cite>. Copyright © 2016–2017 GlobalMentor, Inc. All Rights Reserved. <span class="see">For more information, contact <a href="mailto:info@globalmentor.com">GlobalMentor, Inc.</a>.</span></aside>
    <h4>ISO 8601</h4>
    <p>The standard that governs time representation around the world is <cite>ISO 8601: Data elements and interchange formats — Information interchange — Representation of dates and times</cite>. This document is put out by the <dfn>International Organization for Standardization</dfn> (<abbr>ISO</abbr>), and prescribes various format for representing dates and times. One of the most most common is the representation for a date and time of day, with an included offset from UTC, such as <code>1985-04-12T10:15:30+04:00</code>.</p>
    <p>Beyond the specific syntax, however, ISO 8601 is based on a certain conception of dates and times. Dates are based on the Gregorian calendar, and rules are introduced for representing dates before the institution of the Gregorian calendar in 1875. Week numbers and day ordinals are fixed, and definitions are put into place regarding time scales and leap years. Thus many time library implementations rely not only on ISO 8601 for how time should be represented as text, but also for its basic model of time representation.</p>
    <h3>Representing Time</h3>
    <p>The latest versions of Java provide built-in libraries for tracking time, located in the <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/package-summary.html"><code>java.time</code></a> package. Lower-level time processing classes may be found in the <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/temporal/package-summary.html"><code>java.time.temporal</code></a> subpackage including the <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/temporal/Temporal.html"><code>java.time.temporal.Temporal</code></a> interface that underlies most of the Java time-related types, and an even lower-level interface <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/temporal/TemporalAccessor.html"><code>java.time.temporal.TemporalAccessor</code></a> which allows access to individual fields of time classes. <strong

        class="note">The time classes represent <em>value objects</em>, which you have studied.</strong> You can therefore expect the time classes to be immutable to be comparable with <code>equals(…)</code>, and to include static factory methods typically named <code>of(…)</code>. </p>
    <aside class="important">The Java time classes include extensive support for extracting time units, converting between types, and moving forwards and backwards on the timeline. The API explained here presents but the most basic methods. When solving a time-related problem you must consult the full <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/package-summary.html"><code>java.time</code></a> API to get a complete idea of the functionality available.</aside>
    <aside class="tip">Most of the Java time classes, both for human time representation and for computer time, include static factory method <code>now()</code> which returns an instance of that class based upon the current system time in the current time zone.</aside>
    <h4>Local and Zoned Time</h4>
    <p>Representing a general date date or time, not tied to any time zone, is accomplished using the <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/LocalDate.html"><code>java.time.LocalDate</code></a> and <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/LocalTime.html"><code>java.time.LocalTime</code></a> classes. These types correspond to the date marked on a calendar or the time displayed on a clock, respectively. These classes represent immutable value types, so it is no surprise that they provide many static factory methods including <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/LocalDate.html#of-int-int-int-"><code>LocalDate.of(int year, int month, int dayOfMonth)</code></a> and <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/LocalTime.html#of-int-int-int-"><code>LocalTime.of(int hour, int minute, int second)</code></a>. The local date and time classes provide a variety of static factory methods, including some that use the <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/Month.html"><code>java.time.Month</code></a> enum.</p>
    <dl>
      <dt><a href="https://docs.oracle.com/javase/8/docs/api/java/time/LocalDate.html"><code>java.time.LocalDate</code></a></dt>
      <dd>A general year, month, and day not fixed to any time zone.</dd>
      <dt><a href="https://docs.oracle.com/javase/8/docs/api/java/time/MonthDay.html"><code>java.time.MonthDay</code></a></dt>
      <dd>A month and a day not fixed to any time zone or even associated with a particular year.</dd>
      <dt><a href="https://docs.oracle.com/javase/8/docs/api/java/time/Month.html"><code>java.time.Month</code></a></dt>
      <dd>A single month of the year.</dd>
      <dt><a href="https://docs.oracle.com/javase/8/docs/api/java/time/Year.html"><code>java.time.Year</code></a></dt>
      <dd>A year without regard to any time zone.</dd>
      <dt><a href="https://docs.oracle.com/javase/8/docs/api/java/time/LocalTime.html"><code>java.time.LocalTime</code></a></dt>
      <dd>A general time not associated with any particular date nor fixed to any time zone.</dd>
      <dt><a href="https://docs.oracle.com/javase/8/docs/api/java/time/LocalDateTime.html"><code>java.time.LocalDateTime</code></a></dt>
      <dd>A combined date and time not fixed to any time zone.</dd>
    </dl>
    <figure> <figcaption>Examples of local time classes.</figcaption>
      <pre class="line-numbers"><code class="language-java">//someone's date of birth: May 30, 1970
final LocalDate dateOfBirth = LocalDate.of(1980, Month.MAY, 30);
//someone's birthday: May 30
final MonthDay birthday = MonthDay.of(Month.MAY, 30);

//a typical end of a workday: 5:00 p.m.
final LocalTime endOfWorkday = LocalTime.of(17, 0);

//current date and time
final LocalDateTime = LocalDateTime.now();</code></pre>
    </figure>
    <aside class="info">A <code>LocalDate</code> is used for a date of birth and a birthday, because someone will commemorate  their birthday on the same date regardless of where they are in the world. A person born in New York, for instance, will recognize the local of their birthday while traveling in France without regard to the technicalities of the literal date in New York when the date arrives in France.</aside>
    <p>Local dates and times are relative or <q>floating</q>; they do not provide sufficient information to determine precisely when an event occurred or will occur. Recording the date and time of a birth, for instance, must include the time zone. A journal entry as well would normally indicate the time zone in order to fix the exact moment in an absolute way worldwide: <q>December 11, 2016 10:53 a.m. America/Los Angeles</q>.</p>
    <p>The zoned time classes keep track of a time zone in addition to their other time components, and thus are able to represent absolute times in history. To represent the time zone itself Java provides the class <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html"><code>java.time.ZoneId</code></a>. You can create a <code>ZoneId</code> using its static factory method <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#of-java.lang.String-"><code>ZoneId.of(String zoneId)</code></a> with the identifier from the IANA Time Zone Database, such as <code

        class="language-java">ZoneId.of("Europe/Paris")</code>. The current time zone of the system can be found using <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#systemDefault--"><code>ZoneId.systemDefault()</code></a>.</p>
    <p>The main zoned time class is <a href="https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html"><code>java.time.ZonedDateTime</code></a>, which is essentially a <code>LocalDateTime</code> combined with a <code>ZoneId</code>. It can be created using all its components using the static factory method <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html#of-int-int-int-int-int-int-int-java.time.ZoneId-"><code>ZonedDateTime.of(int year, int month, int dayOfMonth, int hour, int minute, int second, int nanoOfSecond, ZoneId zone)</code></a>. There exist various other static factory methods, including one from an existing <code>LocalDate</code>, and <code>LocalTime</code>, with the addition of a time zone using <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html#of-java.time.LocalDate-java.time.LocalTime-java.time.ZoneId-"><code>ZonedDateTime.of(LocalDate date, LocalTime time, ZoneId zone)</code></a>.</p>
    <dl>
      <dt><a href="https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html"><code>java.time.ZoneId</code></a></dt>
      <dd>Identifies a time zone from the IANA Time Zone Database.</dd>
      <dt><a href="https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html"><code>java.time.ZonedDateTime</code></a></dt>
      <dd>Represents an absolute date and time in a particular time zone.</dd>
    </dl>
    <footer><small>Copyright © 2016–2017 GlobalMentor, Inc. All Rights Reserved. Content may not be published or reproduced by any means for any purpose without permission. Version 2017-08-09.</small></footer>
  </body>
</html>

This is what atom-beautify produces after formatting:

<!DOCTYPE html>
<html lang="en-US">

<head>
    <meta charset="UTF-8" />
    <title>Time</title>
    <meta name="author" content="Garret Wilson" />
</head>

<body>
    <aside>This document is an excerpt from the <cite>GlobalMentor Complete Course on Software Development</cite>. Copyright © 2016–2017 GlobalMentor, Inc. All Rights Reserved. <span class="see">For more information, contact <a href="mailto:info@globalmentor.com">GlobalMentor, Inc.</a>.</span></aside>
    <h4>ISO 8601</h4>
    <p>The standard that governs time representation around the world is <cite>ISO 8601: Data elements and interchange formats — Information interchange — Representation of dates and times</cite>. This document is put out by the <dfn>International Organization for Standardization</dfn> (<abbr>ISO</abbr>), and prescribes various format for representing dates and times. One of the most most common is the representation for a date and time of day, with an included offset from UTC, such as <code>1985-04-12T10:15:30+04:00</code>.</p>
    <p>Beyond the specific syntax, however, ISO 8601 is based on a certain conception of dates and times. Dates are based on the Gregorian calendar, and rules are introduced for representing dates before the institution of the Gregorian calendar in 1875. Week numbers and day ordinals are fixed, and definitions are put into place regarding time scales and leap years. Thus many time library implementations rely not only on ISO 8601 for how time should be represented as text, but also for its basic model of time representation.</p>
    <h3>Representing Time</h3>
    <p>The latest versions of Java provide built-in libraries for tracking time, located in the <a href="https://docs.oracle.com/javase/8/docs/api/java/time/package-summary.html"><code>java.time</code></a> package. Lower-level time processing classes may be found in the <a href="https://docs.oracle.com/javase/8/docs/api/java/time/temporal/package-summary.html"><code>java.time.temporal</code></a> subpackage including the <a href="https://docs.oracle.com/javase/8/docs/api/java/time/temporal/Temporal.html"><code>java.time.temporal.Temporal</code></a> interface that underlies most of the Java time-related types, and an even lower-level interface <a href="https://docs.oracle.com/javase/8/docs/api/java/time/temporal/TemporalAccessor.html"><code>java.time.temporal.TemporalAccessor</code></a> which allows access to individual fields of time classes. <strong class="note">The time classes represent <em>value objects</em>, which you have studied.</strong> You can therefore expect the time classes to be immutable to be comparable with <code>equals(…)</code>, and to include static factory methods typically named <code>of(…)</code>. </p>
    <aside class="important">The Java time classes include extensive support for extracting time units, converting between types, and moving forwards and backwards on the timeline. The API explained here presents but the most basic methods. When solving a time-related problem you must consult the full <a href="https://docs.oracle.com/javase/8/docs/api/java/time/package-summary.html"><code>java.time</code></a> API to get a complete idea of the functionality available.</aside>
    <aside class="tip">Most of the Java time classes, both for human time representation and for computer time, include static factory method <code>now()</code> which returns an instance of that class based upon the current system time in the current time zone.</aside>
    <h4>Local and Zoned Time</h4>
    <p>Representing a general date date or time, not tied to any time zone, is accomplished using the <a href="https://docs.oracle.com/javase/8/docs/api/java/time/LocalDate.html"><code>java.time.LocalDate</code></a> and <a href="https://docs.oracle.com/javase/8/docs/api/java/time/LocalTime.html"><code>java.time.LocalTime</code></a> classes. These types correspond to the date marked on a calendar or the time displayed on a clock, respectively. These classes represent immutable value types, so it is no surprise that they provide many static factory methods including <a href="https://docs.oracle.com/javase/8/docs/api/java/time/LocalDate.html#of-int-int-int-"><code>LocalDate.of(int year, int month, int dayOfMonth)</code></a> and <a href="https://docs.oracle.com/javase/8/docs/api/java/time/LocalTime.html#of-int-int-int-"><code>LocalTime.of(int hour, int minute, int second)</code></a>. The local date and time classes provide a variety of static factory methods, including some that use the <a href="https://docs.oracle.com/javase/8/docs/api/java/time/Month.html"><code>java.time.Month</code></a> enum.</p>
    <dl>
        <dt><a href="https://docs.oracle.com/javase/8/docs/api/java/time/LocalDate.html"><code>java.time.LocalDate</code></a></dt>
        <dd>A general year, month, and day not fixed to any time zone.</dd>
        <dt><a href="https://docs.oracle.com/javase/8/docs/api/java/time/MonthDay.html"><code>java.time.MonthDay</code></a></dt>
        <dd>A month and a day not fixed to any time zone or even associated with a particular year.</dd>
        <dt><a href="https://docs.oracle.com/javase/8/docs/api/java/time/Month.html"><code>java.time.Month</code></a></dt>
        <dd>A single month of the year.</dd>
        <dt><a href="https://docs.oracle.com/javase/8/docs/api/java/time/Year.html"><code>java.time.Year</code></a></dt>
        <dd>A year without regard to any time zone.</dd>
        <dt><a href="https://docs.oracle.com/javase/8/docs/api/java/time/LocalTime.html"><code>java.time.LocalTime</code></a></dt>
        <dd>A general time not associated with any particular date nor fixed to any time zone.</dd>
        <dt><a href="https://docs.oracle.com/javase/8/docs/api/java/time/LocalDateTime.html"><code>java.time.LocalDateTime</code></a></dt>
        <dd>A combined date and time not fixed to any time zone.</dd>
    </dl>
    <figure>
        <figcaption>Examples of local time classes.</figcaption>
        <pre class="line-numbers"><code class="language-java">//someone's date of birth: May 30, 1970
    <p>Local dates and times are relative or <q>floating</q>; they do not provide sufficient information to determine precisely when an event occurred or will occur. Recording the date and time of a birth, for instance, must include the time zone. A journal entry as well would normally indicate the time zone in order to fix the exact moment in an absolute way worldwide: <q>December 11, 2016 10:53 a.m. America/Los Angeles</q>.</p>
    <p>The zoned time classes keep track of a time zone in addition to their other time components, and thus are able to represent absolute times in history. To represent the time zone itself Java provides the class <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html"><code>java.time.ZoneId</code></a>. You can create a <code>ZoneId</code> using its static factory method <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#of-java.lang.String-"><code>ZoneId.of(String zoneId)</code></a> with the identifier from the IANA Time Zone Database, such as <code

        class="language-java">ZoneId.of("Europe/Paris")</code>. The current time zone of the system can be found using <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#systemDefault--"><code>ZoneId.systemDefault()</code></a>.</p>
    <p>The main zoned time class is <a href="https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html"><code>java.time.ZonedDateTime</code></a>, which is essentially a <code>LocalDateTime</code> combined with a <code>ZoneId</code>. It can be created using all its components using the static factory method <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html#of-int-int-int-int-int-int-int-java.time.ZoneId-"><code>ZonedDateTime.of(int year, int month, int dayOfMonth, int hour, int minute, int second, int nanoOfSecond, ZoneId zone)</code></a>. There exist various other static factory methods, including one from an existing <code>LocalDate</code>, and <code>LocalTime</code>, with the addition of a time zone using <a

        href="https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html#of-java.time.LocalDate-java.time.LocalTime-java.time.ZoneId-"><code>ZonedDateTime.of(LocalDate date, LocalTime time, ZoneId zone)</code></a>.</p>
    </figure>
    <aside class="info">A <code>LocalDate</code> is used for a date of birth and a birthday, because someone will commemorate  their birthday on the same date regardless of where they are in the world. A person born in New York, for instance, will recognize the local of their birthday while traveling in France without regard to the technicalities of the literal date in New York when the date arrives in France.</aside>
    <p>Local dates and times are relative or <q>floating</q>; they do not provide sufficient information to determine precisely when an event occurred or will occur. Recording the date and time of a birth, for instance, must include the time zone. A journal entry as well would normally indicate the time zone in order to fix the exact moment in an absolute way worldwide: <q>December 11, 2016 10:53 a.m. America/Los Angeles</q>.</p>
    <p>The zoned time classes keep track of a time zone in addition to their other time components, and thus are able to represent absolute times in history. To represent the time zone itself Java provides the class <a href="https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html"><code>java.time.ZoneId</code></a>. You can create a <code>ZoneId</code> using its static factory method <a href="https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#of-java.lang.String-"><code>ZoneId.of(String zoneId)</code></a> with the identifier from the IANA Time Zone Database, such as <code class="language-java">ZoneId.of("Europe/Paris")</code>. The current time zone of the system can be found using <a href="https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#systemDefault--"><code>ZoneId.systemDefault()</code></a>.</p>
    <p>The main zoned time class is <a href="https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html"><code>java.time.ZonedDateTime</code></a>, which is essentially a <code>LocalDateTime</code> combined with a <code>ZoneId</code>. It can be created using all its components using the static factory method <a href="https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html#of-int-int-int-int-int-int-int-java.time.ZoneId-"><code>ZonedDateTime.of(int year, int month, int dayOfMonth, int hour, int minute, int second, int nanoOfSecond, ZoneId zone)</code></a>. There exist various other static factory methods, including one from an existing <code>LocalDate</code>, and <code>LocalTime</code>, with the addition of a time zone using <a href="https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html#of-java.time.LocalDate-java.time.LocalTime-java.time.ZoneId-"><code>ZonedDateTime.of(LocalDate date, LocalTime time, ZoneId zone)</code></a>.</p>
    <dl>
        <dt><a href="https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html"><code>java.time.ZoneId</code></a></dt>
        <dd>Identifies a time zone from the IANA Time Zone Database.</dd>
        <dt><a href="https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html"><code>java.time.ZonedDateTime</code></a></dt>
        <dd>Represents an absolute date and time in a particular time zone.</dd>
    </dl>
    <footer><small>Copyright © 2016–2017 GlobalMentor, Inc. All Rights Reserved. Content may not be published or reproduced by any means for any purpose without permission. Version 2017-08-09.</small></footer>
</body>

Notice the lines after <figcaption>Examples of local time classes.</figcaption>. Specifically, after //someone's date of birth: May 30, 1970, some content has been dropped. In fact depending on the actual surrounding content (e.g. in the original document), sometimes content seems to be moved around arbitrarily!

This bug has already been reported at https://github.com/beautify-web/js-beautify/issues/1225 , but note that this is not reproducible on the js-beautify site, so @bitwiseman indicated it probably was not a js-beautify problem.

garretwilson commented 7 years ago

OK, this is getting more complicated and worse. To understand the situation, let me remind you of the history:

The file in question is of mixed line endings because of this. Normally Atom will show the LF and CRLF line endings as distinct (and at least js-beautify will normalize them). But in the file at issue here, Atom 1.19.0 is not showing the line endings as different, and the line ending indicator at the bottom shows CRLF even though the line endings are mixed.

If I use another editor such as EmEditor to change all the line endings to CRLF or LF (it doesn't matter which, as long as they are all the same), this bug no longer appears!! That is, atom-beautify formats the file with no problem.

So are we now dealing with a new Atom line-ending bug, on top of the BlueGriffon and js-beautify bugs?? (When will it end?!!) The failure of Atom to distinguish the line endings could be an indication of a problem. Or is the display in the editor a red herring, and is atom-beautify (or js-beautify) skipping line endings incorrectly, assuming they are all the same?

garretwilson commented 7 years ago

I'm attaching the exact file that causes the problem. Now that we know it potentially has something to do with line endings, copying and pasting from the inline example above won't help anything.

js-beautify-1225.zip

garretwilson commented 7 years ago

This time it was a bug deep inside Atom: https://github.com/atom/atom/issues/15225 . Apparently they rewrote the low-level buffer handling and didn't pay attention to the line endings, or something to that affect. They claim to have a fix on the way. I'm going to close this atom-beautify issue.