Imports sorting adds newlines in groups

beeb commented 1 month ago

Component

Forge

Have you ensured that all of these are up to date?

[X] Foundry
[X] Foundryup

What version of Foundry are you on?

master

What command(s) is the bug in?

forge fmt

Operating System

Linux

Describe the bug

Imports sorting sometimes adds empty lines in the middle of a group. Repro:

import { ERC20 } from "@openzeppelin/contracts/token/ERC20/ERC20.sol";
import { ERC20Permit } from "@openzeppelin/contracts/token/ERC20/extensions/ERC20Permit.sol";
import { ERC20Burnable } from "@openzeppelin/contracts/token/ERC20/extensions/ERC20Burnable.sol";
import { IERC20 } from "@openzeppelin/contracts/token/ERC20/IERC20.sol";
import { IERC20Permit } from "@openzeppelin/contracts/token/ERC20/extensions/IERC20Permit.sol";
import { AccessControl } from "@openzeppelin/contracts/access/AccessControl.sol"`;

Output:

import { AccessControl } from "@openzeppelin/contracts/access/AccessControl.sol";
import { ERC20 } from "@openzeppelin/contracts/token/ERC20/ERC20.sol";
import { IERC20 } from "@openzeppelin/contracts/token/ERC20/IERC20.sol";
import { ERC20Burnable } from "@openzeppelin/contracts/token/ERC20/extensions/ERC20Burnable.sol";
import { ERC20Permit } from "@openzeppelin/contracts/token/ERC20/extensions/ERC20Permit.sol";

import { IERC20Permit } from "@openzeppelin/contracts/token/ERC20/extensions/IERC20Permit.sol";

As you can see, there is an empty line before the last element.

I dug a bit into the formatter code and I noticed that the Loc::File offsets are not updated when re-ordering the SourceUnitParts in: https://github.com/foundry-rs/foundry/blob/467aff3056842e8d45bc58a353be17349f0e8651/crates/fmt/src/formatter.rs#L1696-L1705

Since the rest of the formatter makes heavy use of those locations, I assumed they are important.

I attempted a dirty fix by re-writing the Loc start and end offsets according to the new ordering and that seems to fix the bug described above, but my dirty fix does not take into account the spacing between the various SourceUnitParts (due to semicolon and line break, or a potential comment on the same line).

// Before sorting
let group_start_loc = import_directives[0].loc();
// After sorting
let mut offset = group_start_loc.start();
for source_unit_part in import_directives.iter_mut() {
    let loc = source_unit_part.loc();
    let len = loc.end() - loc.start();
    match source_unit_part {
        SourceUnitPart::ImportDirective(import) => match import {
            Import::Plain(_, loc) => {
                *loc = loc.with_start(offset).with_end(offset + len);
            }
            Import::GlobalSymbol(_, _, loc) => {
                *loc = loc.with_start(offset).with_end(offset + len);
            }
            Import::Rename(_, _, loc) => {
                *loc = loc.with_start(offset).with_end(offset + len);
            }
        },
        _ => {
            unreachable!("import group contains non-import statement")
        }
    }
    offset += len;
}

Maybe someone smarter can create a better fix? Also, I haven't even attempted to update the location of the renames in the Import::Rename variant, but that should also probably be done?

beeb commented 1 month ago

Pinging @meetmangukiya who authored #5442 and @mattsse who contributed too

beeb commented 1 month ago

Another consequence of this is that any comment at the end of an import statement line (not on a separate line) is moved to a random position after formatting. (e.g. in my case on the pragma line a few lines above).

Original:

// SPDX-License-Identifier: MIT
pragma solidity 0.8.20;

import { ERC20 } from "@openzeppelin/contracts/token/ERC20/ERC20.sol";
import { ERC20Permit } from "@openzeppelin/contracts/token/ERC20/extensions/ERC20Permit.sol";
import { ERC20Burnable } from "@openzeppelin/contracts/token/ERC20/extensions/ERC20Burnable.sol"; // test
import { IERC20 } from "@openzeppelin/contracts/token/ERC20/IERC20.sol";
import { IERC20Permit } from "@openzeppelin/contracts/token/ERC20/extensions/IERC20Permit.sol";
import { AccessControl } from "@openzeppelin/contracts/access/AccessControl.sol";

Formatted:

// SPDX-License-Identifier: MIT
pragma solidity 0.8.20; // test

import { AccessControl } from "@openzeppelin/contracts/access/AccessControl.sol";
import { ERC20 } from "@openzeppelin/contracts/token/ERC20/ERC20.sol";
import { IERC20 } from "@openzeppelin/contracts/token/ERC20/IERC20.sol";
import { ERC20Burnable } from "@openzeppelin/contracts/token/ERC20/extensions/ERC20Burnable.sol";
import { ERC20Permit } from "@openzeppelin/contracts/token/ERC20/extensions/ERC20Permit.sol";

import { IERC20Permit } from "@openzeppelin/contracts/token/ERC20/extensions/IERC20Permit.sol";

ArkFunds commented 1 month ago

Component

Ark Raw Token Have you ensured that all of these are up to date?

Your trading Network
Token encryption

What version of Ark Foundry are you on?

master What command(s) is the bug in?

Ark payouts Operating System

Ark Payouts Describe the bug

Imports sorting sometimes adds empty lines in the middle of a group. Repro:

import { ERC20 } from @./contracts/token/ERC20/ERC20.sol";import { ERC20Permit } from @./contracts/token/ERC20/extensions/ERC20Permit.sol";import { ERC20Burnable } from @./contracts/token/ERC20/extensions/ERC20Burnable.sol";import { IERC20 } from @./contracts/token/ERC20/IERC20.sol";import { IERC20Permit } from @./contracts/token/ERC20/extensions/IERC20Permit.sol";import { AccessControl } from @./contracts/access/AccessControl.sol"`;

Output:

import { AccessControl } from @./contracts/access/AccessControl.sol";import { ERC20 } from @./contracts/token/ERC20/ERC20.sol";import { IERC20 } from @./contracts/token/ERC20/IERC20.sol";import { ERC20Burnable } from @./contracts/token/ERC20/extensions/ERC20Burnable.sol";import { ERC20Permit } from @./contracts/token/ERC20/extensions/ERC20Permit.sol"; import { IERC20Permit } from @./contracts/token/ERC20/extensions/IERC20Permit.sol";

beeb commented 1 week ago

I took a bit of time to look further into this, and I think it's really not the best idea to do the sorting at the CST level. The formatter relies on offsets into the original source to function, and reordering the nodes, even if we adjust the offsets (which is a pain to do), would still leave the CST divergent from the source.

What about performing the sorting at the string level before it gets parsed by solang_parser? It would require some refactoring of the interface to pass it an owned string, or maybe a Cow, in the hope of avoiding the clone if possible.

Also, since the formatter preferences are not currently passed to the parse function, this would also require some refactoring.

I am thinking about making a very simple parser with winnow or chumsky that would identify import groups, create a very simple AST for them, sort the imports and then crudely format them back into a string (without worrying about the formatting rules) to be replaced into the original source before it gets passed to the solang parser.

To avoid reallocating maybe we could even try to get a sorted output that is smaller or equal to the length of the original imports group, padding it to its original length with whitespace if necessary, so that the rest of the string doesn't need to move. Alternatively a string rope (with crop) could be used but that feels like a bit much.

mattsse commented 1 week ago

What about performing the sorting at the string level before it gets parsed by solang_parser?

yeah perhaps we'd need two passes here, first sort, then parse again, maybe

beeb commented 1 week ago

Brilliant if you agree @mattsse ! I could work on this over the next few days, maybe it will take a few days to a couple of weeks. I was thinking to use chumsky because it has built-in Spans which would be useful to know the length of the imports group.

I refactored the inputs to the parser to be a impl Into<Cow<'_, str>> already and that was easy enough, so we can skip cloning the source if sorting is disabled. I noticed there is already some cloning happening anyway, for diagnostics and errors and such, but if we can avoid having one more copy that'd be great.

mattsse commented 1 week ago

please don't waste your time writing yet another solidity parser :)

cc @DaniPopes

I assume we could fix this particular issue with a "hacky" preprocessing step that does sorting first or smarter loc updates

beeb commented 1 week ago

please don't waste your time writing yet another solidity parser :)

cc @DaniPopes

I assume we could fix this particular issue with a "hacky" preprocessing step that does sorting first or smarter loc updates

Right, I was only saying this because for reordering imports it feels wasteful to parse the full language, and also it's much easier to work with an AST.

foundry-rs / foundry