bcgov / ckanext-bcgov

BC Data Catalogue source code, main ckan extension
http://catalogue.data.gov.bc.ca
GNU Affero General Public License v3.0
24 stars 23 forks source link

Add SOURCE_DATABASE to core schema #395

Closed jeff-at-h3 closed 6 years ago

jeff-at-h3 commented 6 years ago

Need to be able to identify the source of the data, e.g., BCGW, OSDB

Need to re-add the UI

Needs more technical analysis Next steps: 1-2 hours to investigate if the changes are programmatic or are there other implications

dkelsey commented 6 years ago

There is a field that shows up in the Additional Information section of OFI Resources:

The field is resource_storage_location under resource

resource location

jeff-at-h3 commented 6 years ago

source

dkelsey commented 6 years ago

@jeff-at-h3 SOURCE_DATABASE is intended to describe if the object is stored in the BCGW or in the NRS Operational DB.

I quickly poked around and found that resource_storage_location is coming from resource.extras I don't think a new column is necessary.

garrettH3S commented 6 years ago

@dkelsey Just to clarify, within the form, the resource_storage_location fields options need to be updated (see above) and no modification to the schema is necessary.

garrettH3S commented 6 years ago

@dkelsey Does this look correct to you? screen shot 2018-01-30 at 10 38 30 am

cnewallbcgov commented 6 years ago

Confirmed Resource Storage Location implemented in CAD environment. Soliciting feedback from NRS stakeholders concerning list items. List items will have acronyms spelled out and optionally followed by popular acronym in brackets, e.g., BC Geographic Warehouse (BCGW). Will follow up with stakeholder feedback.

cnewallbcgov commented 6 years ago

For this release, the domain values for the pick list need to be changed to the following, in order of appearance:

"Unspecified" should be the default value.

Please advise when ready to QA.

cnewallbcgov commented 6 years ago

Also, impact assessment of removing unneeded locations has not been done.

"SDE" and "SDO" would be changed to "Ministry or other database" unless there is clear indication that the BCGW is the location.

"GeoDB" would be changed to "File system".

"X-Y", "Converge" and "External" would change to "Unspecified" unless there are indicators in the record content to help make a different choice.

I am unsure if there are dependencies on the legacy values planned to be removed and unsure if removing them will cause application issues if legacy values are not recast. Advice please.

garrettH3S commented 6 years ago

@cnewallbcgov It's easy enough to change the forms select values, and map the values over within the code and database. In testing the api, I found that no validation is done when setting the resource_storage_location; thus, any external system adding resources to a dataset via the api, will not be given an error when adding. They will be adding resource_storage_locations that are no longer within the list. screen shot 2018-02-19 at 11 33 13 am

Steps to migrate these data types 1) Update system vocabulary. 2) Update hardcoded storage location regions within ckanext-bcgov 3) MIgrate database values (run db scripts) 4) Add api validation layer to validate resource_storage_location input. ( api call is /dataset/new_resource/{id} )

dkelsey commented 6 years ago

This is what is in CAT: image

There are a couple of corrections to make, some things I think should be removed and I have questions:

Corrections:

location correction
BCGW Data Store BC Geographic Warehouse
Converge What is this?
EDC Data Store This is problematic. Some files will be stored in the FileStore others will be in the DataStore. The catalogue does not store and data; it only manages metadata.
External ?
GeoDB How is this a location?
SDE How is this a location? Isn't this SPATIAL_DATA_TYPE?
SDO ibid
X-Y I don't know what this is?
pub.data.gov.bc.ca How about DataBC hosted? External could represent anyone else

What about AGO data?

@cnewallbcgov lets clarify the list.

cnewallbcgov commented 6 years ago

I specified the selection values in an earlier comment. The CAT environment should reflect these values else this is a bug.

ArcGIS Online should be accommodated - a new value "Esri ArcGIS Online" placed on the list ahead of FTP site.

Thank you

jeff-at-h3 commented 6 years ago

Hi Colin,

Jared will look into this issue today and figure out the gaps. Sorry for any confusion on this one.

Jeff

On Mar 14, 2018, at 9:34 AM, Colin Newall notifications@github.com wrote:

I specified the selection values in an earlier comment. The CAT environment should reflect these values else this is a bug.

ArcGIS Online should be accommodated - a new value "Esri ArcGIS Online" placed on the list ahead of FTP site.

Thank you

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bcgov/ckanext-bcgov/issues/395#issuecomment-373088204, or mute the thread https://github.com/notifications/unsubscribe-auth/AgWB9Z42K9icDRuuaAjp1x19SEOUKs9-ks5teUangaJpZM4RJBW0.

jrods commented 6 years ago

@cnewallbcgov @dkelsey there seems to be a miss step in the issue process with this ticket, as this ticket was in New Issues and not in Review/QA. It should have not been migrated to TESTING.

@garrettH3S had some concerns with regarding the API that needs your approval @cnewallbcgov, because there is additional work needed regarding API usage to validate the input of the resource_storage_locations field. For example, any user using the API can input any text they wish that would not match the list in the web ui.

cnewallbcgov commented 6 years ago

Re. Garrett's concerns: is this validation issue introduced by these specific changes to the pick list or has it always been risk that exists in the Production environment today?

If not a new issue then please make the changes, as we have previously accepted the risk. A separate issue ticket should be created to described the opportunity to improve data validation via the API. Work not be in scope for 1.7.0.

Please confirm. Thanks

jrods commented 6 years ago

@cnewallbcgov for this field, no validation exists in all environments currently. I will create a new issue for the validation.

As for the list value, I would just like to confirm with you what should be present as per yours and @dkelsey's comments, in order & as-displayed:

default is Unspecified

dkelsey commented 6 years ago

I'd prefer Catalogue Data Store be change to Resource Store Only CSV's are stored in the DataStore. Users are not going to know the difference between when certain things are in the File Store and other are in both the File Store and the Data Store.

Further a problem already exists where people think the catalogue stores data. It does not. The catalogue only stores metadata. The Data Store and File Store are things that add the capability to store small files. adding 'Catalogue' contributes to user congnative load and confusion.

For BC Geographic Warehouse: For now editors add the Object Name to the metadata record. The resources associated are not configured directly by the user, a widget is run on their behalf; the widget will add a resources that is a "custom download" and set source_database at that time. Said another way: no-one will ever manually set source database to BC Geographic Warehouse. I think it should be removed from the list. I would be set only by the widget that creates the "custom download"

jeff-at-h3 commented 6 years ago

@dkelsey , is this waiting approval by @cnewallbcgov before we start any work on it?

dkelsey commented 6 years ago

@jeff-at-h3 yes.

cnewallbcgov commented 6 years ago

Here is the approved list in order of appearance:

default is Unspecified

Thanks

jrods commented 6 years ago

@cnewallbcgov I need to change 'BC Geographic Warehouse (BCGW)' to 'BC Geographic Warehouse BCGW' because ckan only allows alphanumeric character and these symbols: - _ .

cnewallbcgov commented 6 years ago

In that case let's omit "BCGW". Thanks

dkelsey commented 6 years ago

Editors can add resources as by uploading them or specifying a Link. image

This example is a bit pedantic however, do we want to add remote url to the list? @cnewallbcgov

What is the intention of unspecified?

image

cnewallbcgov commented 6 years ago

Re. adding "remote URL". How about modifying "FTP Site" to be "Web or FTP Site" ?

Re. "Unspecified" - default value, better than null/blank.

dkelsey commented 6 years ago

"Web or FTP Site" good enough for me. So when you stated "default is unspecified" you meant "The default value is the string 'unspecified'" and not "the default value is unspecified or blank or the empty string"

cnewallbcgov commented 6 years ago

I was thinking the value would be "Unspecified" instead of blank.

jrods commented 6 years ago

@cnewallbcgov Just to clarify what needs to be changed,

Is this correct?

cnewallbcgov commented 6 years ago

Not quite. No need to add "Remote URL".

jrods commented 6 years ago

@dkelsey I see the issue you mentioned about blank values for SOURCE_DATABASE, looks like there was the ignore_missing validator, I'll try applying the not_empty validator instead

jrods commented 6 years ago

Updated the list, vocab list will need to be updated in cad.

As for the validators, delivered and verified in cad, prevents resource_storage_location from being empty, however, with the api any value can be stored that's not in the edc_vocab list that's usually displayed in the webui.

dkelsey commented 6 years ago

I created a ticket describing the Validation issue: #481

I've run the script that updates the vocab.

dkelsey commented 6 years ago

I've updated the vocabulary in PROD. BCGW Data Store EDC Data Store were removed.