NASA-PDS / registry-mgr

Standalone Registry Manager application responsible for managing the PDS Registry (https://github.com/NASA-PDS/registry) schemas and indexes.
https://nasa-pds.github.io/registry
Other
0 stars 2 forks source link

Missing Science_Facets fields definitions in registry schema #14

Closed jordanpadams closed 3 years ago

jordanpadams commented 3 years ago

Describe the bug Identified by @rchenatjpl

In the bundle.xml, I commented out (marked by <!--RC) three parts that caused registry-manager load-data to fail. For the first two, I got: % registry-manager load-data -file /tmp/harvOut Elasticsearch URL: http://localhost:9200 Index: registry Updating schema with fields from /tmp/harvOut/fields.txt [ERROR] Could not find datatype for field 'pds/Science_Facets/pds/facet1' I don't know if that's something the user should fix or if it's a software bug.

The third touchy part is for is_facility and is_telescope.

Is this a bug, or is the user supposed to modify registry-manager/elastic/registry.json (or some other file)? % registry-manager load-data -file /tmp/harvOut Elasticsearch URL: http://localhost:9200 Index: registry Updating schema with fields from /tmp/harvOut/fields.txt [ERROR] Could not find datatype for field 'ref_lid_facility'

jordanpadams commented 3 years ago

From @rchenatjpl:

I don't fully understand what this means, but maybe it's causing a disconnect between the .xsd and the .json. https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1E00.xsd says:

The Science_Facets class contains the science-related search facets. It is optional and may be repeated if an product has facets related to, for example, two different disciplines (as defined by the discipline_name facet). Note that Science_Facets was modeled with Discipline_Facets as a component and Discipline_Facets was modeled with Group_Facet1 and Group_Facet2 as components. This dependency hierarchy was flattened and only Science_Facets exists in the schema. ... while searching for "facet1" in https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_JSON_1E00.JSON points to something within Group_Facet1 not Science_Facets

From @tdddblog:

We are trying to understand how to find out from “PDS4_PDS_JSON_1E00.JSON” file that "0001_NASA_PDS_1.pds.Science_Facets" class has “pds.facet1” attribute.

This is a description from "0001_NASA_PDS_1.pds.Science_Facets" class: "The Science_Facets class contains the science-related search facets. It is optional and may be repeated if an product has facets related to, for example, two different disciplines (as defined by the discipline_name facet). Note that Science_Facets was modeled with Discipline_Facets as a component and Discipline_Facets was modeled with Group_Facet1 and Group_Facet2 as components. This dependency hierarchy was flattened and only Science_Facets exists in the schema."

I don’t think this “flattening” is reflected anywhere in “PDS4_PDS_JSON_1E00.JSON” file.

There is another class which is not flattened, "0001_NASA_PDS_1.pds.Primary_Result_Summary". It contains "0001_NASA_PDS_1.pds.Science_Facets" as a component.

How to tell that "0001_NASA_PDS_1.pds.Science_Facets" should be flattened, but "0001_NASA_PDS_1.pds.Primary_Result_Summary" should not.

from @jshughes:

You are correct, the “flattening” is not recorded in the IM.

The proposer of the class initially submitted a flat model for the PDS4 XML label. However she also added dependency requirements resulting in a hierarchical model. To handle the anomaly, I added special code to IMTool/LDDTool that flattens the model for the XML serialization. It has been a problem area ever since and I would not let it happen again.

We could probably submit a change request to add indicators that this one structure is being flattened. However I am not sure that this would help you that much.

There might be one alternative that we could look into, but it would definitely required a change request. The class could be flattened in the IM and the dependency requirements implemented as Schematron rules. Again, I am not sure if this would actually help you.

From @tdddblog:

For now we can create a custom CSV file and load it. I think only 5 fields need special handling:

pds/Science_Facets/pds/discipline_name pds/Science_Facets/pds/facet1 pds/Science_Facets/pds/subfacet1 pds/Science_Facets/pds/facet2 pds/Science_Facets/pds/subfacet2