IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
882 stars 494 forks source link

Feature Request: Add API endpoint for comparing Dataset Versions #10888

Open ekraffmiller opened 1 month ago

ekraffmiller commented 1 month ago

Overview of the Feature Request Need an API endpoint that will compare two dataset versions and return a list of differences between the versions. This is needed to support the SPA Dataset Page

What kind of user is the feature intended for? (Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin) API User

What inspired the request? https://github.com/IQSS/dataverse-client-javascript/issues/197 https://github.com/IQSS/dataverse-frontend/issues/511

What existing behavior do you want changed? None

Any brand new behavior do you want to add to Dataverse? New Dataverse API endpoint

Any open or closed issues related to this feature request?

Are you thinking about creating a pull request for this feature?
Help is always welcome, is this feature something you or your organization plan to implement?

stevenwinship commented 1 month ago

There are multiple options for how the response could be formatted:

Option 1. json list with two objects. each object contains only the modified fields. ex. [{'id'=versionAid, 'subject'='version A subject', 'subtitle'=''},{'id'=versionBid, 'subject'='New subject', 'subtitle'='new subtitle'}]

Option 2. json response with before and after values: ex. {'subject'= {'versionAid' = 'version A subject', 'versionBid'='New subject'}, 'subtitle'={'versionAid' = '', 'versionBid'='new subtitle'}}

I'm sure there could be more options. @ekraffmiller could you let me know what format would make the most sense for the SPA code?

qqmyers commented 1 month ago

FWIW: I think the outputs from the DatasetVersionDifference class are more like option 2. Similarly, I think that's the format closer to how we display the differences in the dataset page version table.

stevenwinship commented 1 month ago

@ekraffmiller Here is the Json formatted output that I believe will work well in a table on the UI. Please let me know if this works or if changes are needed.

{
    "status": "OK",
    "data": {
        "Metadata": {
            "Author": {
                "0": "Finch, Fiona; (Birds Inc.)",
                "1": "Finch, Fiona; (Birds Inc.); Poe, Edgar Allen; (Baltimore Poets); Mulligan, Hercules; (Sons of Liberty)"
            },
            "Subject": {
                "0": "Medicine, Health and Life Sciences",
                "1": "Medicine, Health and Life Sciences; Astronomy and Astrophysics; Other"
            },
            "Producer": {
                "0": "",
                "1": "Allen, Irwin; (MGM); Spielberg, Stephen; (ILM)"
            },
            "Design Type": {
                "0": "",
                "1": "Parallel Group Design; Nested Case Control Design"
            }
        },
        "Files": {
            "added": [
                {
                    "description": "",
                    "label": "dataverseproject.png",
                    "restricted": false,
                    "version": 1,
                    "datasetVersionId": 4,
                    "dataFile": {
                        "id": 11,
                        "persistentId": "",
                        "filename": "dataverseproject.png",
                        "contentType": "image/png",
                        "friendlyType": "PNG Image",
                        "filesize": 12918,
                        "description": "",
                        "storageIdentifier": "local://19296b38e55-71601b050f3d",
                        "rootDataFileId": -1,
                        "md5": "e55e66ff785045154875c4b6841eb527",
                        "checksum": {
                            "type": "MD5",
                            "value": "e55e66ff785045154875c4b6841eb527"
                        },
                        "tabularData": false,
                        "creationDate": "2024-10-16",
                        "fileAccessRequest": true
                    }
                }
            ],
            "removed": [
                {
                    "description": "",
                    "label": "dataverseproject_logo.jpg",
                    "restricted": false,
                    "version": 1,
                    "datasetVersionId": 3,
                    "dataFile": {
                        "id": 10,
                        "persistentId": "",
                        "filename": "dataverseproject_logo.jpg",
                        "contentType": "image/jpeg",
                        "friendlyType": "JPEG Image",
                        "filesize": 4462,
                        "description": "",
                        "storageIdentifier": "local://19296b371ed-ea4ec196219e",
                        "rootDataFileId": -1,
                        "md5": "c1edbefa86a55c5037873370ae7fd7b6",
                        "checksum": {
                            "type": "MD5",
                            "value": "c1edbefa86a55c5037873370ae7fd7b6"
                        },
                        "tabularData": false,
                        "creationDate": "2024-10-16",
                        "publicationDate": "2024-10-16",
                        "fileAccessRequest": true
                    }
                }
            ],
            "modified": [
                {
                    "fileMetadata": {
                        "description": "",
                        "label": "dataverse-icon-1200.png",
                        "restricted": false,
                        "version": 1,
                        "datasetVersionId": 3,
                        "dataFile": {
                            "id": 9,
                            "persistentId": "",
                            "filename": "dataverse-icon-1200.png",
                            "contentType": "image/png",
                            "friendlyType": "PNG Image",
                            "filesize": 27650,
                            "description": "",
                            "storageIdentifier": "local://19296b370c7-b90cd887fd36",
                            "rootDataFileId": -1,
                            "md5": "a23eb44803d9127bc6e055f77b869816",
                            "checksum": {
                                "type": "MD5",
                                "value": "a23eb44803d9127bc6e055f77b869816"
                            },
                            "tabularData": false,
                            "creationDate": "2024-10-16",
                            "publicationDate": "2024-10-16",
                            "fileAccessRequest": true
                        }
                    },
                    "isRestricted": {
                        "0": "false",
                        "1": "true"
                    }
                }
            ]
        },
        "TermsOfAccess": {
            "Data Access Place": {
                "0": "",
                "1": "Somewhere"
            }
        }
    }
}
ekraffmiller commented 1 month ago

thanks @stevenwinship I will review the SPA requirements today

ekraffmiller commented 3 weeks ago

Hi @stevenwinship sorry for the late reply, for the Compare Version Details Popup, we will need the changes grouped by metadata block. Also it would be more flexible in the UI to have the changed values in an array (for "multiple" type fields.)

Here is an example:

{
  "oldVersion": {
    "versionNumber": "1.0",
    "createdDate": "2023-01-15T08:00:00Z"
  },
  "newVersion": {
    "versionNumber": "1.1",
    "createdDate": "2024-01-20T08:00:00Z"
  },
  "metadataChanges": [
    {
      "blockName": "citation",
      "changed": [
        {
          "fieldName": "title",
          "oldValue": ["Initial Dataset Title"],
          "newValue": ["Updated Dataset Title"]
        },
        {
          "fieldName": "author",
          "oldValue": ["John Doe"],
          "newValue": ["John Doe", "Jane Smith"]
        }
      ]
    },
    {
      "blockName": "socialscience",
      "changed": [
        {
          "fieldName": "studyDesignType",
          "oldValue": ["design type 1","design type 2"],
          "newValue": ["design type 1a", "design type 1b", "design type 1c"]
        }
      ]
    }
  ],

    "fileChanges": [
      {
        "fileName": "data.csv",
        "changes": [
          {
            "fieldName": "filePath",
            "oldValue": "/oldpath/data_v1.csv",
            "newValue": "/newpathdata_v2.csv"
          }
        ]
      },
      {
        "fileName": "readme.txt",
        "changes": [
          {
            "fieldName": "description",
            "oldValue": "Basic dataset info",
            "newValue": "Updated dataset info with more details"
          }
        ]
      }
    ]

}
ekraffmiller commented 3 weeks ago

I'm sorry I realized there is some missing file information in the JSON example I sent you, here is an updated example. I have added fields to the file elements. I also included a 'filesReplaced" array. Other changes:

jsonexample.json

{
  "oldVersion": {
    "versionNumber": "1.0",
    "lastUpdatedDate": "2023-01-15T08:00:00Z"
  },
  "newVersion": {
    "versionNumber": "1.1",
    "lastUpdatedDate": "2024-01-20T08:00:00Z"
  },
  "metadataChanges": [
    {
      "blockName": "citation",
      "changed": [
        {
          "fieldName": "title",
          "oldValue": ["Initial Dataset Title"],
          "newValue": ["Updated Dataset Title"]
        },
        {
          "fieldName": "author",
          "oldValue": ["John Doe"],
          "newValue": ["John Doe", "Jane Smith"]
        }
      ]
    },
    {
      "blockName": "socialscience",
      "changed": [
        {
          "fieldName": "studyDesignType",
          "oldValue": ["design type 1", "design type 2"],
          "newValue": ["design type 1a", "design type 1b", "design type 1c"]
        }
      ]
    }
  ],
  "filesAdded": [
    {
      "fileName": "teacher_survey.tab",
      "md5": "1234567890",
      "type": "Tab-Delimited",
      "fileId": 3,
      "tags": ["Documentation"],
      "description": "my file description",
      "isRestricted": false
    },
    {
      "fileName": "biomedical.json",
      "md5": "1234567890",
      "type": "JSON",
      "fileId": 4,
      "tags": ["Documentation", "Data"],
      "description": "my json file description",
      "isRestricted": true
    }
  ],
  "filesReplaced": [
    {
      "oldFile": {
        "fileName": "teacher_survey.tab",
        "md5": "1234567890",
        "type": "Tab-Delimited",
        "fileId": 3,
        "tags": ["Documentation", "Data"],
        "description": "my json file description",
        "isRestricted": false
      },
      "newFile": {
        "fileName": "biomedical.json",
        "md5": "1234567890",
        "type": "JSON",
        "fileId": 4,
        "tags": ["Documentation", "Data"],
        "description": "my json file description",
        "isRestricted": true
      }
    },
    {
      "oldFile": {
        "fileName": "test1.json",
        "md5": "1234567890",
        "type": "JSON",
        "fileId": 3,
        "isRestricted": false
      },
      "newFile": {
        "fileName": "test2.json",
        "md5": "1234567890",
        "type": "JSON",
        "fileId": 4,
        "isRestricted": true
      }
    }
  ],
  "filesChanged": [
    {
      "fileName": "data.csv",
      "md5": "1234567890",
      "fileId": 1,
      "changes": [
        {
          "fieldName": "filePath",
          "oldValue": "/oldpath/data_v1.csv",
          "newValue": "/newpathdata_v2.csv"
        }
      ]
    },
    {
      "fileName": "readme.txt",
      "md5": "1234567890",
      "fileId": 2,
      "changes": [
        {
          "fieldName": "description",
          "oldValue": "Basic dataset info",
          "newValue": "Updated dataset info with more details"
        }
      ]
    }
  ]
 "TermsOfAccess": {
            "changed": [
                {
                    "fieldName": "dataAccessPlace",
                    "oldValue": "",
                    "newValue": "Somewhere"
                }
            ]
        }
}
stevenwinship commented 3 weeks ago

Here is an example of the latest json format:

{
    "status": "OK",
    "data": {
        "oldVersion": {
            "versionNumber": "1.0",
            "lastUpdatedDate": "2024-10-24T15:17:11Z"
        },
        "newVersion": {
            "versionNumber": "DRAFT",
            "lastUpdatedDate": "2024-10-24T15:17:16Z"
        },
        "metadataChanges": [
            {
                "blockName": "Citation Metadata",
                "changed": [
                    {
                        "fieldName": "Author",
                        "oldValue": "Finch, Fiona; (Birds Inc.)",
                        "newValue": "Finch, Fiona; (Birds Inc.); Poe, Edgar Allen; (Baltimore Poets); Mulligan, Hercules; (Sons of Liberty)"
                    },
                    {
                        "fieldName": "Subject",
                        "oldValue": "Medicine, Health and Life Sciences",
                        "newValue": "Medicine, Health and Life Sciences; Astronomy and Astrophysics; Other"
                    },
                    {
                        "fieldName": "Producer",
                        "oldValue": "",
                        "newValue": "Allen, Irwin; (MGM); Spielberg, Stephen; (ILM)"
                    }
                ]
            },
            {
                "blockName": "Life Sciences Metadata",
                "changed": [
                    {
                        "fieldName": "Design Type",
                        "oldValue": "",
                        "newValue": "Parallel Group Design; Nested Case Control Design"
                    }
                ]
            }
        ],
        "filesAdded": [
            {
                "fileName": "test.tab",
                "filePath": "data/subdir1",
                "MD5": "77c7f03a7d7772907b43f0b322cef723",
                "type": "text/tab-separated-values",
                "fileId": 42,
                "description": "my description",
                "isRestricted": false,
                "categories": [
                    "Data"
                ],
                "tags": [
                    "Survey"
                ]
            }
        ],
        "filesRemoved": [
            {
                "fileName": "dataverseproject_logo.jpg",
                "filePath": "data/subdir1",
                "MD5": "c1edbefa86a55c5037873370ae7fd7b6",
                "type": "image/jpeg",
                "fileId": 40,
                "description": "my description",
                "isRestricted": false,
                "categories": [
                    "Data"
                ]
            }
        ],
        "filesReplaced": [
            {
                "oldFile": {
                    "fileName": "favicon-16x16.png",
                    "filePath": "data/subdir1",
                    "MD5": "d3c852e7ecb92fd105ba4018116a9be8",
                    "type": "image/png",
                    "fileId": 41,
                    "description": "my description",
                    "isRestricted": false,
                    "categories": [
                        "Data"
                    ]
                },
                "newFile": {
                    "fileName": "favicon-32x32.png",
                    "filePath": "data/subdir1",
                    "MD5": "c931f7add8b6a1f9a691046b77c231fa",
                    "type": "image/png",
                    "fileId": 43,
                    "description": "my description",
                    "isRestricted": false,
                    "categories": [
                        "Data"
                    ]
                }
            }
        ],
        "fileChanges": [
            {
                "fileName": "dataverse-icon-1200.png",
                "MD5": "a23eb44803d9127bc6e055f77b869816",
                "fileId": 39,
                "changed": [
                    {
                        "fieldName": "isRestricted",
                        "oldValue": "false",
                        "newValue": "true"
                    }
                ]
            }
        ],
        "TermsOfAccess": {
            "changed": [
                {
                    "fieldName": "Data Access Place",
                    "oldValue": "",
                    "newValue": "Somewhere"
                }
            ]
        }
    }
}