OpenRefine / OpenRefine

OpenRefine is a free, open source power tool for working with messy data and improving it
https://openrefine.org/
BSD 3-Clause "New" or "Revised" License
10.66k stars 1.93k forks source link

Better generalizability for the reorder-columns operation #5576

Open wetneb opened 1 year ago

wetneb commented 1 year ago

We offer an operation, reorder-columns, which is able to change the order of columns in an arbitrary way and remove any number of columns, in a single step.

image

The operation which corresponds to this screenshot is represented internally as:

  {
    "op": "core/column-reorder",
    "columnNames": [
      "Identifiant du lieu",
      "Année du tournage",
      "Type de tournage",
      "Titre",
      "director",
      "producer",
      "Réalisateur",
      "Producteur",
      "Code postal",
      "Coordonnée en X",
      "Coordonnée en Y",
      "geo_shape",
      "geo_point_2d"
    ],
    "description": "Reorder columns"
  }

In other words, the operation parameters simply remember what is the final order of the remaining columns after reorder. As explained in #4055, this operation does not generalize well to other datasets, when it is used in the "Apply" dialog of the Undo/Redo tab, because any columns that were not in the original dataset but are present in the new one will be deleted by this operation.

Proposed solution

We should find other ways to specify this operation so that the actual intent of the user is captured better. It is difficult to come up with a precise specification, but as a litmus test, I would expect the following criteria to be satisfied:

Alternatives considered

One could decide to break down the operation into multiple steps, applying the operations that remove or move a single column multiple times, to reach the desired state. Working on such a decomposition is likely useful to understand this issue better, but I would rather prefer that this dialog does not generate lengthy lists of steps in the project history.

Additional context

Follow-up to #4055 and #5563.

thadguidry commented 1 year ago

I tend to agree with the proposal. So are you for/against splitting the dialog into two menu items? I am against splitting the dialog as it flows well, to reorder a few columns and delete/don't care about these other columns. Behind the scenes, the current operation itself could be split into multiple operations. Agree on non-lengthy lists but they are lengthy often enough when dealing with wide datasets, so.

wetneb commented 1 year ago

Yes I would keep the dialog as is.