bcgov / MFIN-Data-Catalogue

The Finance Data Catalogue enables users to discover data holdings at the BC Ministry of Finance and offers information and functionality that benefits consumers of data for business purposes. The product is built using Drupal and adheres to the Government of BC's Core Administrative and Descriptive etadata Standard.
Other
6 stars 0 forks source link

build information schedule #128

Closed NicoledeGreef closed 1 year ago

NicoledeGreef commented 1 year ago

OP timer

https://openplus.monday.com/boards/4092908516/pulses/5007970088


Discussed in https://github.com/bcgov/MFIN-Data-Catalogue/discussions/113

Originally posted by **CraigClark** August 9, 2023 Going by the [ORCS document](https://www2.gov.bc.ca/assets/gov/british-columbians-our-governments/services-policies-for-government/information-management-technology/records-management/orcs/income-taxation.pdf) we would like to make as much of this a taxonomy as possible. This is for find-ability as we can use terms as facets on the search page. There is an issue with numeric values that prevents ORCS from being implemented strictly with terms, for instance, FY+6y (Fiscal Year plus 6 years). Are people likely to filter for data sets based on their active/semi-active period code? It would be easiest if this was just a value on the data-set. Making this a taxonomy term would be a bad user experience because there are so many possible numeric options. You would need a term for each one. We could create a vocabularies: **Records life cycle** Terms - A | Active - SA | Semi-active - FD | Final Disposition **Final disposition categories** Terms - DE | Destruction - FR | Full Retention - SR | Selective Retention - OD | Other Disposition - NA | Not Applicable **Special flags** Terms - FOI | Freedom of Information/Protection of Privacy - PIB | Personal Information Bank - VR | Vital Records All of the above could be facets on the search page. *Active and semi-active period codes* would be a field. You would see the value when you look at a record set, however active period codes would not be facets. @NicoledeGreef , can we proceed this way?
NicoledeGreef commented 1 year ago

I agree with the approach you've described.

CraigClark commented 1 year ago

@NicoledeGreef I'm working on this right now. Is a data set always ARCS or ORCS, or can it be ARCS and ORCS?

NicoledeGreef commented 1 year ago

I have consulted with our Infor Mgmt SME and...

There may be overlap between applying ARCS and ORCS to a dataset though this is expected to be infrequent. Can we plan for the possibility that both may apply?

CraigClark commented 1 year ago

@NicoledeGreef I would like your thoughts on this approach. It's a bit tricky to build, but we can do it. Can you share internally for input and get back to me please?

NicoledeGreef commented 1 year ago

I met with SMEs from Info Mgmt and Data Curation. We reviewed the figma link in previous comment

Here is some feedback. We agreed that this is an advanced topic for many users and therefore cannot be expected to be included when each metadata record is drafted by a business user (so we must not make this a mandatory element in order to save the record).

For the foreseeable future, we agreed to omit "Active Period" and "Semi-Active Period" from the user inputs, reducing some complexity.

It is not imperative that the values we do track are searchable (facets) within the application; Product Owner persona and Info Mgmt super user SMEs could have access to a report and that would be sufficient. It is presumed that most business users won't be aware of Info Schedule values without having a conversation with an Info Management specialist.

Ideally we would make us of pick lists based on taxonomy values in order to reduce user input inconsistencies.

Most of the items in the Finance Data Catalogue are anticipated to be ORCS but

Top level choice should be:

Information Schedule Type one of: ARCS | ORCS | Special | Unscheduled


Using ARCS as an example:

Information Schedule Name would be: Administrative Records Classification System

Schedule Number would be: 100001 is unique to ARCS (ORCS and Special each have many sched numbers; Unscheduled has none)

Based on the Information Schedule choice a user makes, a list of Information Schedule Primary Title topics would be available; if we use ARCS as an example Information Schedule Name, a subset of Information Schedule Primary Number/Title values would be as listed here:

6000 - Information Technology, General 6450 - Information System Development & Changes 6820 - Information Systems Operations 6840 - Change Management 6880 - Telecommunication Network Management 6890 - Radio Communication

if we use 6840 - Change Management as an example Information Schedule Primary, the Information Schedule Secondary Title values would be as listed here:

-00 Policy and procedures -10 General -20 IT change management records

ARCS don't change very often if at all. ARCS are listed here.

An example of a ARCS schedule code value once a user has made selections: 100001-440-20 Whose codes translate to: "Administration-440 - Reporting & Statistical Analysis-20 - Reports and statistics (not covered elsewhere)" but for display purposes the Secondary Title label is sufficient, Reporting & Statistical Analysis


An ORCS Schedule exists only after a business area has had a conversation with the Records Management SMEs within Government and an ORCS will published to the web in the ORCS Library.

There may be more ORCS that apply to the work that is being done logged in the Finance Data Catalogue than any other type (ARC, Special or Unscheduled).

You can search the ORCS Library; a search for "orcs" returned all records. The facets-like left hand bar in the ORCS Library lets you refine by Ministry and this brings it down to sixteen results. There may be ORCS from other Ministries that apply to the work being done within Ministry of Finance; we should be mindful of that but as a start Information Schedule Name could be derived from one of those 16 hits; for example:

Banking and Cash Management ORCS Business Risk Management Programs ORCS Community Initiatives & Olympic Bid ORCS Consumer Taxation ORCS Crown Agency Services ORCS Federal-provincial Relations & Research ORCS Gaming ORCS Income Taxation ORCS Mineral, Oil & Gas Revenue ORCS Officer of the Comptroller General: 2007 Edition ORCS Office of the Comptroller General ORCS Property Taxation ORCS Revenue & Student Loan Contract Management ORCS Revenue Services British Columbia ORCS Risk Management ORCS Taxation Revenue Appeals ORCS

In the case a user chooses ORCS then there should be a pick list where they can choose the Schedule Number; We will have to unearth these schedule numbers from the PDFs.

"Property Taxation ORCS" is Schedule Number 160184, for example (see Property Taxation ORCS)

There are Primary and Secondary Schedule Title numbers for an ORCS. These are found within the PDF documents in the ORCS Library (looks in Section 1)

ORCS and ARCS unique identifier is based on: Schedule Number-Primary Title Number- Secondary Title Number

An example of a ORCS schedule code value once a user has made selections: 160184-45000-03 Whose codes translate to: "160184 - Property Taxation ORCS-45000 - PROPERTY TAXATION - GENERAL -03- Property taxation data warehouse data" but for display purposes the Secondary Title label is sufficient, Property taxation data warehouse data


Special is a category outside of ARCS and ORCS (see this page)

The list is relatively brief for Special: Commission of Inquiry Records (schedule 112907) Computer System Electronic Backup Records (schedule 112910) has been superseded by ARCS secondary 6820-05 Executive Records (schedule 102906) General Records (schedule 112909) Government House Records (schedule 112911) Records of the British Columbia Commission of Inquiry into Missing and Murdered Indigenous Women and Girls (schedule 170439) Lieutenant-Governor Records (schedule 112912) Record Copies of Published Maps (schedule 112908) Records of Defunct Programs (schedule 158691) Redundant Source Information (schedule 206175) Special Media Records (schedule 102905) Transitory Information (schedule 102901) Year 2000 (Y2K) Project Documentation and Test Data (schedule 112916)

Most likely to be applied are in bold text.


Unscheduled: a Schedule has not yet been created for the subject metadata item. Means that work on records management is outstanding and until the work is complete the data records cannot be disposed of. There may be a few areas in RMO that are currently Unscheduled.

No further info needed to be supplied.


For the ARCS | ORCS | Special cases, the following are good to track:

Records life cycle

A | Active SA | Semi-active FD | Final Disposition

Final disposition categories

DE | Destruction FR | Full Retention SR | Selective Retention

Special flags

FOI | Freedom of Information/Protection of Privacy PIB | Personal Information Bank VR | Vital Records

CraigClark commented 1 year ago

@NicoledeGreef If I'm understanding this correctly, it's a lot cleaner than what I came up with. I think we can do it with a taxonomy and an entity reverence view.

Is it OK if we separate the life cycle? That's more difficult to put into a taxonomy, unless a life cycle is always tied to the info schedule value. For example, if 160184-45000-03 always has the same life cycle and final disposition, then it's a field we set that on a term and the user never has to think about it. It looks like that's the case in the PDF, not sure though

NicoledeGreef commented 1 year ago

Question for follow-up: Are the Special Flags always associated with a Secondary value or is it dataset specific?

mjmcclung commented 1 year ago

@NicoledeGreef If I'm understanding this correctly, it's a lot cleaner than what I came up with. I think we can do it with a taxonomy and an entity reverence view.

Is it OK if we separate the life cycle? That's more difficult to put into a taxonomy, unless a life cycle is always tied to the info schedule value. For example, if 160184-45000-03 always has the same life cycle and final disposition, then it's a field we set that on a term and the user never has to think about it. It looks like that's the case in the PDF, not sure though

By separating the life cycle, do you mean separating the A, SA and FD phases? Yes please, as I will want to query/report on these individually (e.g. show me all the datasets where the FD = FR).

Each primary/secondary will have it's own A/SA/FD values and this won't change over time. For example ARCS 6450-80 has SO 2y SR. The only wrinkle is the OPR/Non-OPR qualifier (OPR - Office of Primary Responsibility). If a business area is OPR they have one set of lifecycle values and if they are Non-OPR they have another. But... I think we have OPR at the dataset level so this is part of why the choosing of an Information Schedule is a bit of a conversation - at least for now until we bring up our knowledge in this space.

mjmcclung commented 1 year ago

Question for follow-up: Are the Special Flags always associated with a Secondary value or is it dataset specific?

Yes, special flags are for specific secondaries. There are some examples in the Property Taxation ORCS (e.g. 45800-05: Property transfer tax data and images has a flag of PIB).

Interestingly, I also found a special flag in that schedule that was not on our list: PUR: The Taxation (Rural Area) Act (s. 22) requires that a copy of the tax roll be made available for public review.

CraigClark commented 1 year ago

@lkmorlan I have taken this as far as I can. Here is my update

Some of the work on this is done, see the 128-data-set-life-cycle-information-schedule branch

The idea here is to tie everything to do with info schedule to a taxonomy. That way, users only need to select the correct information schedule and everything else populates.

Most of this is set up. The taxonomy for the information schedule exists, information_schedule. It has all the fields required to populate everything in the info schedule. Some of these fields are entity references to other taxonomy terms:

The duration, disposition and special flags taxonomies all use a field_abbr_full_name field. This is because abbreviations are used in the official info schedule specifications, but it's nice to provide a human-readable name.

Issues

SHS issue

Simple hierarchical select is used in the form display, see /admin/structure/types/manage/data_set/form-display/data_set_description.

The issue is that on the build page, there is an extra field showing the TID. This should not be visible to the user.

bc-info-schedule-exposed-tid

Information schedule values

Friendly info schedule value

See #1 on the screenshot below

I'm using a feature called Flexible Hierarchy in Client-side Hierarchical Select to display the first and last item in a taxonomy. This works, though if there is a better way, go for it.

NOTE: I added the schedule type, client may not want this

Info schedule code

See #2 on the screenshot below

This has to be a field so it can be used in reports, views, etc. We can't do it with twig.

The info schedule code needs work. Right now I'm using the field_token_value module to show the code, but it has some issues.

When an admin creates a term in information_schedule, one field is field_schedule_number This corresponds to the numeric code used in the info schedule spec. Depending on which schedule is used, the root term may or may not use a numeric code.

ARCS

ARCS will only ever have 3 levels, including the root term.

ARCS uses a root numeric code, so the info schedule code should render, for example, as 100001-440-20, where:

ORCS

ORCS will have 4 levels, including the root term.

ORCS does not use a numeric code for the root term. (ORCS). The code for ORCS should display, for example, as 160184-45000-03, where:

Special

Special has two levels, including the root term

Special does not use a numeric code for the root term. (Special).

The code for Special should display, for example, as 112907, where:

NOTES

  • field_token_value may not be the right approach. A custom token might be better.
  • We can leverage a help guide, for example if we tell people to leave field_schedule_number empty for ORCS and special, we can generate the info schedule code by following the chain of numbers

Display

The friendly name and the numeric code should be visually related, under the same label, Information schedule, but they need to be separate fields. See #1 on the screenshot

In the details section, see #3 in the screenshot, the field_token_value module is used here. This is fine, except values don't update unless the node is saved. That's not a huge deal as the info schedule won't change much. It would be nice though that if something update in the term it would update here. Not something to spend a lot of time on, but if we have something quick that is more dynamic, it would be a good idea.

For active, semi-active and final disposition, we should have Not applicable if there is no value. This serves the purpose of letting the user know a value was not missed in error. We can accomplish this by setting 'NA' as the default value and requiring the field.

For Special Flags, this is rarely used. Here, the field and label should not be visible unless there is a value.

Screenshot

bc-info-schedule

DOD

CraigClark commented 1 year ago

Further changes to what I have done.

Final disposition and special flags don't need to be fields here. They can be done in twig.

CraigClark commented 1 year ago

This works, assigning to myself to complete documentation

CraigClark commented 1 year ago

assigning to Liam.

There is an issue with generating the code

When I do the following, it fails

  1. Created test term under Property taxation general with number 12345
  2. save
  3. Create dataset Test ORCS 3
  4. Apply test term in Information schedule field
  5. select Publish
  6. Save
  7. see that the number is not there

see the problem. It loads the parents using the term ID, which doesn’t exist until it is saved. @lkmorlan via slack

CraigClark commented 1 year ago

@NicoledeGreef this is ready for your review.

Please follow the documentation here https://mfin-data-catalogue.apps.silver.devops.gov.bc.ca/documentation/information-schedule

That way, you can test the docs and the feature at the same time.

mjmcclung commented 1 year ago

@NicoledeGreef - I am interested in seeing how this works in action - might be worth setting up a time to chat/demo? We might want to drop some pieces for now (retention, flags, etc) and focus on the core information first. There are more nuances as more complexity is added and even just having high level schedule info is an improvement over current state. There has been recent feedback regarding rolling out new IM things in bite-sized pieces to make it less overwhelming given our maturity in this space.