data-dot-all / dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
https://data-dot-all.github.io/dataall/
Apache License 2.0
234 stars 82 forks source link

Feature flags for topics and confidentiality and custom text list for confidentiality #1032

Closed TejasRGitHub closed 8 months ago

TejasRGitHub commented 9 months ago

Is your idea related to a problem? Please describe. When creating a dataset, the user has to mention confidentiality and topics.

The confidentiality list / classifications types might be different from company to company. Also, those are required fields and might not be used while searching, etc.

Describe the solution you'd like In order to incorporate this requirement, add custom list to the confidentiality and create feature flags for topics and confidentiality fields so they can be disabled if required

P.S. Don't attach files. Please, prefer add code snippets directly in the message body.

TejasRGitHub commented 9 months ago

EDIT

Changes required for this feature enhancement

Frontend Changes and Config additions -

  1. Add feature flags configs for topics and confidentiality to enable/disable on UI
    
    "datasets": {
            "active": true,
            "features": {
              ...,
                "preview_data": false,
                "glue_crawler": false,
               "confidentiality_dropdown" : true,
                "custom_confidentiality_mapping" : {
                    "Public" : "Unclassified",
                    "Custom Confidentiality" : "Official",
                    "Custom Confidential" : "Secret",
                    "Another Confidentiality" : "Official"
                },
                "topics_dropdown" : false
            }
        },

2. Add custom list for confidentiality list and keep the default in constants ( `frontend/src/modules/constants.js`) 
3. Make the UI for files `DatasetCreateForm.js` , `DatasetEditForm.js`, `DatasetImportForm.js`, `DataGovernance.js` render conditionally the topics and confidentiality based on feature flag 
4. Change the `Catalog.js` view to display topics and confidentiality based on config

_Backend Changes_ 

Confidentiality and topics are present as enums in the backend  ~~( for e.g. - `dataall/modules/datasets_base/db/enums.py` and `dataall/modules/datasets/api/dataset/enums.py`)~~. They are used in the validation of inputs when a dataset is created in the graphql level. Also, those enums are used for filter and in conditions in various functions related to datasets 

1.  Modify the enums classes to extend custom configs's list or create a new class and use that as a custom config's enum. For this I am thinking of making the `ConfidentialityClassification` enum , have custom configs as class variables if they are present in config.json otherwise default to the three standard confidentiality config's. This similarly could also be applied for topics if custom topics are present.

**Question** - As the default confidentiality levels ( i.e. Unclassified, Secret, Official ) are used in the code for filtering and in some conditions, the new configs's list should work with them. The way I can think about this is, either for the custom config have a map, which will specify which custom confidentiality is similar to Unclassified, Secret, etc and then translate this config wherever the conditions are used with standard configs ( i.e. Unclassified, Secret, Official ). See `dataall/modules/datasets/services/dataset_profiling_service.py` -> _check_preview_permissions_if_needed function for example. 
OR Make changes to the understanding of confidentiality levels and come up with a more generic logic which doesn't tightly bind with the use of standard confidentiality levels . @dlpzx , @noah-paige , @zsaltys  could you please let me know you thoughts on this. 

EDIT - Going forward with mapping the custom confidentiality with existing confidentiality metrics. 

**Other Question to answer and clarify** 

1. How does this change affect the indexing in open search ?  -> **After checking the code, found out that the Dataset indexes will automatically get updated. If the user decides to update the dataset's with new confidentiality levels the index will update and should reflect in catalog** 
2. Does hiding search selectables like topics and classification ( confidentiality levels ) create issues with Catalog search . ( I am yet to see if this creates a problem ) -> **This doesn't cause issues**
3. Check if the topics and confidentiality is used on some other modules like Glossaries, Quicksight Dashboard, etc . @dlpzx , @noah-paige -> **Did not found any from my testing** 
zsaltys commented 9 months ago

I'd define flags as confidentiality_dropdown and topics_dropdown. It will align with other feature glue_crawler. Imo these are "features" and we enable/disable them. preview_data should be renamed to data_preview

I would rename custom_confidentiality_list to custom_confidentiality_values and make it a DICT like: {"Custom_Secret": {"hide_schema": true, hide_preview: "true"}} or even simpler: {"Custom_Secret": "secret", "Custom_Secret2": "public"}

For catalog make sure to hide the topics or classification from search if those feature are disabled. Then it shouldn't affect search.

TejasRGitHub commented 9 months ago

Hi @zsaltys , Thanks for the suggestions. I have edited the design / code change document on this issue

dlpzx commented 9 months ago

We are working offline with @TejasRGitHub on the implementation of this issue

noah-paige commented 8 months ago

Completed as part of #1049