HicServices / RDMP

Research Data Management Platform (RDMP) is an open source application for the loading,linking,anonymisation and extraction of datasets stored in relational databases.
https://github.com/HicServices/RDMP#research-data-management-platform
GNU General Public License v3.0
35 stars 16 forks source link

Visualize 'Differences' in Selected Columns (vs Catalogue level definitions) [Zsolt] #1266

Open tznind opened 2 years ago

tznind commented 2 years ago

[Yesterday 16:27] Zsolt Szarka (Staff)Sync Extraction with Catalogue Hi team_RDMP, when a cloned extraction is out of date with the catalogue, is there a way to see what the difference is? What are the possible differences behind the 'Different' message (see screenshot)? The right-click->"update with catalogue settings" option is very handy, but I would like to know if it is just the order of columns, or a column is now extraction PK, or something else. Thank you

Is your feature request related to a problem? Please describe. When adding a dataset to an Extraction RDMP stores the current state of columns (what transformation if any to apply, what the column order is in the extraction etc). This is persisted and retained regardless of changes to the original Catalogue level column info.

When cloning an old extraction (E.g. 6 months or 2 years or whatever) it is likely that changes will have been made to the dataset extraction metadata. This is shown as a warning on extraction and in the Selected Columns Editing UI as the text 'Different' (in red).

The user can choose between repeating the extraction using the exact same column definitions or update to the latest definitions.

However there is no way to view what the differences are (only that there are some).

image

Describe the solution you'd like Add a way to view the differences

zzszarka commented 2 years ago

Could this be a more detailed synchronisation process, where RDMP would list the differences, and prompt the user to chose what to do with them? Dialogue

These are the differences between this column-extraction and the current state of the catalogue item: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

attribute | value in extraction | current global value | Sync? -- | -- | -- | -- Order | 1 | 5 | - [ ] Is PK | TRUE | FALSE |  - [ ] Extraction SQL | Work.dbo.anonThis(thiscolumn) | thiscolumn |  - [ ] alias | thiscolumn | this_column |  - [ ]


| Execute |

tznind commented 2 years ago

Presumably you don't want a popup for every single column in the ExtractionConfiguration so this would have to scale vertically (number of columns with problems) as well as horizontally (each field that is Different).

zzszarka commented 2 years ago

I was thinking single columns, but it really depends on how many columns have differences, and whether they are the same type of differences (e.g.: Order)? If it is a difference in order, I assume it would usually affect almost all columns, and that would mean repeating the same popup one-by-one. So I think your suggestion is much better. How would it look like for about 30+ columns?