bids-standard / legacy-validator

Validator for the Brain Imaging Data Structure
https://bids-standard.github.io/legacy-validator/
MIT License
186 stars 111 forks source link

Rethinking issues #2043

Closed effigies closed 3 months ago

effigies commented 4 months ago

While triaging a failure to report a missing recommended key, I discovered that only one key is shown per file, presumably to avoid excess noise. I'd like to rethink the Issue structure. I am starting to think of it as a table:

Issue Sub-issue Severity Location Message Affects
EMPTY_FILE None Error <path> ... <path>
SIDECAR_KEY_RECOMMENDED SidecarKey Warning <path>.json ... <datafile1> <datafile2> ...

Here (Issue, Sub-issue, Location) are lookup keys to a specific problem in a file. The severity and location are the main things needed to determine whether and where to fix the problem, with message and affects being useful for understanding.

To mock up a structure in Python:

class Issue:
    code: str
    sub: str | None
    severity: Literal["Error", "Warning"]
    location: Path
    message: str
    affects: list[Path]

class Issues:
    table: dict[tuple[str, str | None, Path], Issue]

    def add(self, code, sub, severity, location, msg, affected_file):
        key = (code, sub, location)
        issue = self.table.get(key) or Issue(code, sub, severity, location, msg, [])
        issue.affects.append(affected_file)

Thoughts? Anything you think we should add, or something we currently have that this wouldn't handle?

effigies commented 4 months ago

Updated issue structure:

class Issue:
    code: str
    sub: str | None  # Mostly applies to missing/invalid JSON keys or TSV columns
    location: Path | None  # Likely be absent for missing sidecar keys
    severity: Literal["Error", "Warning"]
    message: str | None
    suggestion: str | None
    affects: list[Path]
    rule: str | None  # Schema path, if applicable (maybe)