Ekryd / sortpom

Maven plugin that helps the user sort pom.xml.
https://github.com/Ekryd/sortpom/wiki/
BSD 3-Clause "New" or "Revised" License
334 stars 178 forks source link

Preserve formatting inside project element #397

Closed zabetak closed 8 months ago

zabetak commented 8 months ago

Currently there is no way to preserve/control formatting inside the project element of the pom.xml file. Whitespaces and linebreaks are removed and namespaces are reordered as it is shown below.

Before:

<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

After:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

It would be nice if it was possible to retain the namespace order when applying the sortpom plugin.

This request is somewhat duplicate with #55, #56, #85. However, the project now uses dom4j instead of jdom so the limitations that were preventing the implementation of this feature in the first place are no longer there.

zabetak commented 8 months ago

It seems feasible to override XMLWriter#writeElement method in PatchedXMLWriter and do special handling for the <project> element possibly copy pasting the exact content from the original input.

Ekryd commented 8 months ago

Sounds interesting, I'll have a look.

Ekryd commented 8 months ago

Bad news, the SaxParser, which reads the file, does not hold any information about the original content. There is no convenient way to preserve linebreaks and whitespaces. Second part is ordering:

<project xmlns="http://maven.apache.org/POM/4.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

xmlns="http://maven.apache.org/POM/4.0.0" is the overall namespace

xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd" is an actual attribute to the project element.

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" is the namespace for that attribute.

If all three were attributes, then we could have kept track of reordering. Now, they are three separate things and I cannot see in which order they were written in the original XML. Basically same limitation as above.

Ekryd commented 8 months ago

There is way to make indention before the project attribute: indentSchemaLocation. It will always format the the project element like this:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

I thought it would be better, but other tools usually ruin that order anyway: whenever a version needs to be bumped due to deployments or dependency updates

zabetak commented 8 months ago

Isn't it possible to get the content of the project element from the raw input (before XML parsing) and hardcode it later on when we are about to write the XML file?

This is somehow similar to the https://github.com/Ekryd/sortpom/wiki/IgnoringSections feature but specialized for the <project ...> where "ignoringSections" cannot be applied.

Ekryd commented 8 months ago

Yes, it is possible. But...

Ignoring sections is a hack in order to solve some concrete cases in the Maven setup, i.e. some content needed to be have custom comments because of some tool parsing the pom-file in order to generate something code-like. Ignoring sections also needs to take away full xml elements, so that the rest of the file is parseable. Ignoring the start of the project element would not be possible, because the rest of the file cannot be parsed then. So that functionality cannot be used directly.

Another way is to sort the whole file, go back to the original file and do text replacements. This can be done, but I'm not too keen on doing it. I would solve an XML formatting problem, not a Maven formatting problem. It is out of scope for the plugins main responsibility. The code in the plugin only parses xml tags, not text formatting. Any new code that handles text directly would increase overall complexity, thus making it harder to maintain. Also, it would likely be harder to reuse the plugin for other XML parsing, if there are use-cases out there.

Ekryd commented 8 months ago

I have discovered that there is a fairly easy way to indent the namespace as well. Then you could get a schema like this:

<project
         xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

Indenting both namespace and attribute of the project element, but reordering of the content is still not available. Is this a possible solution?

zabetak commented 8 months ago

First of all thanks a lot for all the research and effort spend on this ticket. Much appreciated!

For my particular use-case, I don't care much about indentation inside the project element (for previous requests this could be useful). The request is leave the things as they are inside the project element.

In fact for my use-case it seems that I could leave with the following workaround.

  1. Use the plugin to sort the pom according to the defined rules
  2. Restore the formatting changes in project element by editing the pom.xml file (manually or automatically via other plugins/scripts).

By properly configuring the verify goal subsequent calls to verify would leave the file intact so I think that's good enough for me at the moment.

I opened the ticketing that change to keep formatting would be easy and not too intrusive after the dom4j upgrade but it seems that is not so I am fine with closing this ticket.

Ekryd commented 8 months ago

No worries, I appreciate that you took the time to formulate the issue.

A quick search for replacing content gave me https://github.com/floverfelt/find-and-replace-maven-plugin, maybe that could create an automatic solution? Although I would suggest opening an issue to read the replacement text from a file 😄 . (echo-maven-plugin functionality can be used for inspiration for that).

Best of luck!