LibraryCarpentry / week-four-library-carpentry--DEPRECATED

Week Four lesson
http://librarycarpentry.github.io/city-november-2015/
8 stars 6 forks source link

Reproducibility #13

Open davanstrien opened 8 years ago

davanstrien commented 8 years ago

I might have missed this during the session. Is there of exporting a history of all the transformations you have made to a document so other people can see/reproduce what you have done?

ostephens commented 8 years ago

@davanstrien I mentioned it in passing, but didn't show it on screen. The basics are covered in https://github.com/LibraryCarpentry/week-four-library-carpentry/blob/master/lesson-materials/Basic-OpenRefine-functions-II.md but to give a bit more detail here...

All the edits you make to data in OpenRefine are recorded and stored in a format called 'JSON' (stands for Javascript Object Notation - this doesn't really matter but just to say this is a standard format for sharing structured data used quite often in programming). You can access this list of edits by finding the 'Undo/Redo' pane (on the lefthand side of the Refine interface) and clicking 'Extract' - this will show you a screen like:

screen shot 2015-11-30 at 22 34 53

If you want to extract the transformation history, you can select any/all the steps on the righthand side, and then copy the 'JSON' from the right hand side. If I want to preserve the history, I paste this into a text editor and save the file as a plain text file.

If you look at the JSON you can see that it has a structure:

[
  {
    "op": "core/text-transform",
    "description": "Text transform on cells in column Title using expression grel:value.toTitlecase()",
    "engineConfig": {
      "facets": [
        {
          "invert": false,
          "expression": "value",
          "selectError": false,
          "omitError": false,
          "selectBlank": false,
          "name": "Publisher",
          "omitBlank": false,
          "columnName": "Publisher",
          "type": "list",
          "selection": [
            {
              "v": {
                "v": "Society of Pharmaceutical Technocrats",
                "l": "Society of Pharmaceutical Technocrats"
              }
            },
            {
              "v": {
                "v": "Akshantala Enterprises",
                "l": "Akshantala Enterprises"
              }
            }
          ]
        }
      ],
      "mode": "record-based"
    },
    "columnName": "Title",
    "expression": "grel:value.toTitlecase()",
    "onError": "keep-original",
    "repeat": false,
    "repeatCount": 10
  }
]

This is the JSON for a single transformation. If there were multiple transformations in the history, there would be multiple sections of JSON with the same structure. The 'description' bit tells you in a reasonably readable format what the transformation is doing. Much of the rest of it records what facets were applied/values selected etc. when the transformation was applied, and towards the bottom you can see what the actual GREL was and which column it was applied to.

This can be kept for reproducibility and also applied in other projects by clicking the 'Apply' button in the Undo/Redo panel and pasting in the JSON