bcgov / MFIN-Data-Catalogue

The Finance Data Catalogue enables users to discover data holdings at the BC Ministry of Finance and offers information and functionality that benefits consumers of data for business purposes. The product is built using Drupal and adheres to the Government of BC's Core Administrative and Descriptive etadata Standard.
Other
6 stars 0 forks source link

Add ability to store metadata for data collections #502

Open mjmcclung opened 5 months ago

mjmcclung commented 5 months ago

OP timer


User story

There may be times where we don't want to share table-level information, however it might still be appropriate to share collection-level information to let prospective users know that something exists. E.g. pay transparency reporting. It may be useful to know that pay transparency reporting information exists, but it's not likely that the specific tables or details will be shared or visible to most visitors to the catalog. Visibility of a data collection MR may be broader then the data asset MR.

There are also situations where certain metadata information (e.g. related documents) relates to a data collection as a whole and it would be more appropriate and efficient to link this information at the data collection level, rather than redundantly repeat it at the data asset level. Examples that come to mind include data models and information management plans that cover an entire collection.

There is also value in knowing details at the data collection level (e.g. security classification, IM classification) to drive broader data management assessments, activities and planning.

This feature support the following requirements

Managing Government Information Policy:

Data Management Policy:

Additional context

There is currently an ability to indicate a 'Series' in the catalogue, but it functions more like a label. There is no current way to add metadata to a series.

Proposed solution

Either add an additional Record type called 'Data Collection', or, preferably, add a qualifier on Data record types to indicate it is a collection (e.g. Does this metadata record represent a data collection?).

If the qualifier is true, it could unlock an additional build section of the MR to allow editors to specify what MR's are part of the collection ("Collection assets"). This could be similar to how assets are added for lineage, but without creating the actual lineage piece as lineage is more important at the asset level and not the collection level.

Specifying assets as part of a collection needs to create a linkage so that when users are on an asset MR page they can see what collection (if any) the asset belongs to and click on it to find out more information about the collection. Likewise, the collection MR should specify what assets are part of a collection and allow the user to explore them if needed (per visibility settings).

Collections could also be used as a search facet, similar to how Series is used now.

Estimated level of effort

Definition of done (DoD)

Testing

Automated functional tests

Automated site tests

This feature requires manual testing

  1. first test step …
  2. second test step …
  3. etc …
ChristaBull commented 5 months ago

A solution like this could potentially help solve a problem I've started notice with our metadata records. It's not uncommon for us to have a multiple collections of data on the same topic that have very similar names that could cause confusion for consumers.

For example, with Property Transfer Tax (PTT) we currently have multiple versions of the data in the Finance Data Store. One table concept in a few of those transactions looks like:

If we have a data collection option we could clearly identify all of the tables that make up specific models (e.g. the analytical product) so the relationship is clearer to clients working with them.

Note, a workaround for my team might be to create another metadata record and link to the different parts within its description if we can link to them. This isn't a system-based connection but would have a similar visual impact.

NicoledeGreef commented 5 months ago

Thanks for the input. For enhancement/ideas, please characterize the business problem that needs solving, rather than how that problem should be solved. The "How" should be left to the development team.

Problem Statement: While the current Finance Data Catalogue (June 2024) allows metadata authors to describe Data, Form, and Report assets as metadata records (MRs) there is also a need to define a means of grouping together asset MRs under a banner entity which can have related documents defined that would infer application to the entire group of asset MRs linked to the banner entity.

How is this different than "Series"? As initially implemented, "Series" is a taxonomy-based attribute that can be applied to one or more MRs; the attribute value can be used as a filter by topic. It lacks dimension because it is essentially a tag and there is no way to expand the tag's relationship to other things; asset MRs can be tagged and commonality can be established but that is insufficient. The taxonomy values are mostly program names and they don't account for different groupings of data assets that may need to be described slightly differently rather than just tagged with the program name.

How is this different from "Assets used"? "Assets used" helps a metadata author define the lineage of their asset (the items on which it is dependent). The "Assets used" value is based off search results of all MR titles; any asset that can be searched and applied to an MR must already be named as an MR in the Catalogue.

The business seeks a solution that will allow a metadata author to describe a banner entity and define relevant attributes such as description, related documents, and relationships to asset MRs.

Note: The above problem statement intentionally steers away from the term "collection" as it is used in other contexts when it comes to data, e.g. "data collection" often means the act of collecting data for a purpose.