astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
32.38k stars 1.08k forks source link

Ruff added extra string to code when formating #11801

Open MRossa157 opened 4 months ago

MRossa157 commented 4 months ago

This bug was on Ruff version ruff 0.4.7 and also on ruff 0.4.8

When I format this code:

import numpy as np
import torch
from torch.utils.data import Dataset
from tqdm import tqdm
from multiprocessing import Pool, cpu_count

class MovieLensTrainDataset(Dataset):
    """MovieLens PyTorch Dataset for Training"""

    def __init__(self, ratings, all_movieIds, is_training=True):
        self.is_training = is_training
        self.ratings = ratings
        self.all_movieIds = all_movieIds
        self.users, self.items, self.labels = self.get_dataset()

    def __len__(self):
        return len(self.users)

    def __getitem__(self, idx):
        return self.users[idx], self.items[idx], self.labels[idx]

    def generate_samples(self, user_item):
        u, i = user_item
        num_negatives = 4
        local_users, local_items, local_labels = [u], [i], [1]
        while len(local_users) <= num_negatives:
            negative_item = np.random.choice(self.all_movieIds)
            if (u, negative_item) not in self.user_item_set:
                local_users.append(u)
                local_items.append(negative_item)
                local_labels.append(0)
        return local_users, local_items, local_labels

    def get_dataset(self):
        users, items, labels = [], [], []
        self.user_item_set = set(zip(self.ratings["userId"], self.ratings["movieId"]))

        with Pool(processes=cpu_count()) as pool:
            results = list(tqdm(pool.imap(self.generate_samples, self.user_item_set),
                                total=len(self.user_item_set),
                                desc=f"Generating samples for {'training' if self.is_training else 'validating'}"))

        for user_list, item_list, label_list in results:
            users.extend(user_list)
            items.extend(item_list)
            labels.extend(label_list)

        return torch.tensor(users, dtype=torch.long), torch.tensor(items, dtype=torch.long), torch.tensor(labels, dtype=torch.float)

Ruff makes this:

from multiprocessing import Pool, cpu_count

import numpy as np
import torch
from torch.utils.data import Dataset
from tqdm import tqdm

class MovieLensTrainDataset(Dataset):
    """MovieLens PyTorch Dataset for Training"""

    def __init__(self, ratings, all_movieIds, is_training=True):
        self.is_training = is_training
        self.ratings = ratings
        self.all_movieIds = all_movieIds
        self.users, self.items, self.labels = self.get_dataset()

    def __len__(self):
        return len(self.users)

    def __getitem__(self, idx):
        return self.users[idx], self.items[idx], self.labels[idx]

    def generate_samples(self, user_item):
        u, i = user_item
        num_negatives = 4
        local_users, local_items, local_labels = [u], [i], [1]
        while len(local_users) <= num_negatives:
            negative_item = np.random.choice(self.all_movieIds)
            if (u, negative_item) not in self.user_item_set:
                local_users.append(u)
                local_items.append(negative_item)
                local_labels.append(0)
        return local_users, local_items, local_labels

    def get_dataset(self):
        users, items, labels = [], [], []
        self.user_item_set = set(zip(self.ratings["userId"], self.ratings["movieId"]))

        with Pool(processes=cpu_count()) as pool:
            results = list(
                tqdm(
                    pool.imap(self.generate_samples, self.user_item_set),
                    total=len(self.user_item_set),
                    desc=f"Generating samples for {'training' if self.is_training else 'validating'}",
                )
            )

        for user_list, item_list, label_list in results:
            users.extend(user_list)
            items.extend(item_list)
            labels.extend(label_list)

        return (
            torch.tensor(users, dtype=torch.long),
            torch.tensor(items, dtype=torch.long),
            torch.tensor(labels, dtype=torch.float),
        )

        return (
            torch.tensor(users, dtype=torch.long),
            torch.tensor(items, dtype=torch.long),
            torch.tensor(labels, dtype=torch.float),
        )

He is adding new (extra) return:

return (
            torch.tensor(users, dtype=torch.long),
            torch.tensor(items, dtype=torch.long),
            torch.tensor(labels, dtype=torch.float),
        )

My VS code settings (settings.json):

{
    "workbench.colorTheme": "Default Dark+",
    "workbench.preferredDarkColorTheme": "Default Dark+",
    "tabnine.experimentalAutoImports": true,
    "workbench.iconTheme": "vscode-great-icons",
    "RainbowBrackets.depreciation-notice": false,
    "explorer.confirmDelete": false,
    "python.defaultInterpreterPath": "C:\\Users\\maxim\\AppData\\Local\\Programs\\Python\\Python312\\python.exe",
    "explorer.confirmDragAndDrop": false,
    "jupyter.askForKernelRestart": false,
    "editor.unicodeHighlight.ambiguousCharacters": false,
    "notebook.formatOnSave.enabled": true,
    "notebook.codeActionsOnSave": {
        "notebook.source.organizeImports": "explicit",
    },
    "[python]": {
        "editor.defaultFormatter": "charliermarsh.ruff",
        "editor.formatOnSave": true,
        "editor.codeActionsOnSave": {
            "source.organizeImports": "explicit"
        },
    },
    "isort.args":["--profile", "black"],
    "editor.detectIndentation": false,

    "telemetry.telemetryLevel": "off",
    "files.trimTrailingWhitespace": true,
    "files.trimFinalNewlines": true,
    "files.autoSave": "afterDelay",
    "files.autoSaveDelay": 5000,
    "python.analysis.autoFormatStrings": true,
    "black-formatter.args": [
        "line-length=150"
    ],
    "security.workspace.trust.untrustedFiles": "open",
    "explorer.confirmPasteNative": false,
    "notebook.confirmDeleteRunningCell": false,
    "git.suggestSmartCommit": false,
    "git.confirmSync": false
}
MichaReiser commented 4 months ago

Uff, that's not good. Do you roughly remember what you were adding when this happened? I suspect that this is an issue with range formatting because this isn't happening when I paste your example into the playground.

MRossa157 commented 4 months ago

Uff, that's not good. Do you roughly remember what you were adding when this happened? I suspect that this is an issue with range formatting because this isn't happening when I paste your example into the playground.

I didn't add anything to Ruff, rather I replaced the black formatter with Ruff's. In addition to the usual Ruff setup, I also added Jupiter Notebook support (it's in the settings). And that's pretty much it. I don't remember any other additions.

Here is the list of extensions that I have installed in VS Code:

- autoDocstring
- Better Comments
- Black formatter
- Dev containers
- Docker
- IntelliCode
- isort
- Jupiter (and other Jupiter extensions)
- Makefile tools
- Python (include Pylance, Python debugger e.t.c)
- Python intend
- Python path
- Rainbow brackets
- Ruff[!!!]
- VS Code great icons
MichaReiser commented 4 months ago

Thanks for sharing the additional data. Do you remember the changes you made to that file? I assume you made some edits, hit save and VS code formatted the code.

@charliermarsh I think there have been instances where Ruff and the isort extension don't get along. Any chance that might be related?

charliermarsh commented 4 months ago

It could. If you have both extensions installed, code is often repeated at the bottom of the file. It's a bug in VS Code itself: https://github.com/microsoft/vscode/issues/174295

MRossa157 commented 4 months ago

Thanks for sharing the additional data. Do you remember the changes you made to that file? I assume you made some edits, hit save and VS code formatted the code.

@charliermarsh I think there have been instances where Ruff and the isort extension don't get along. Any chance that might be related?

Yes, it is. I copied that code, pasted it and pressed CTRL + S (save file) and since I have it set to format on save, it formatted itself. I thought I had described it explicitly, sorry

charliermarsh commented 4 months ago

Yeah, I think you either need to uninstall isort or disable Ruff's import formatting. This issue arises when you have multiple extensions installed that want to handle import formatting. In that case, VS Code ends up running them over one another, leading to contents duplicated like this.