adiwg / mdJson-schemas

JSON schemas, examples, and templates for ADIwg metadata standards
http://www.adiwg.org/projects/
GNU Lesser General Public License v3.0
18 stars 16 forks source link

Encoding identifiers in the schema #11

Closed stansmith907 closed 10 years ago

stansmith907 commented 10 years ago

This is about how we encode identifiers in JSON to support MD_Identifier in CI_Citation and EX_GeographicDescription. We made the decision to place identifiers in JSON as an attribute of responsible party; so that implied that ‘contact’ is equivalent to the ‘citation for the issuing authority’. I have been building skimpy citations for the authority from organizationName only of contact block, that’s all I can use from contact that fits into citation. This creates a minimal authority citation always with a full citedResponsibleParty. Almost the opposite of opposite of what we need, a good citation with optional citedResponsibleParty.

Also, if I ask the 'class_identifier' to build the MD_Identifier from responsibleParty I will always get the skimpy authority citation with full citedResponsibleParty when authority is optional in ISO. CitedResponsibleParty is also optional and I can's shut that one down either. Not going to be funny when we build separate extents for geometries with supplemental information regarding identifiers.

And we also have our previously discussed problem of repeating the long citedResponsibleParty section in metadata record's citation.

I think we need to revisit the idea of having the resourceIdentifier block embedded in responsibleParty. I could even push the idea farther and lobby for an authority array ahead of the metadata block, similar to the contacts array. When we are coding identifiers for multi-point, line, polygon geometries the authorities will become highly reused. We could then just provide the authorityID, roleCode, contactID (optional), identifierName, and identifier with each geometry.

stansmith907 commented 10 years ago

Sorry, I forgot to mention the other big problem. AssociatedResource (AggregateInformation). In ISO this is either a citation for the authority with no identifier; or an MD_Identifier which is a citation for the authority plus an identifier within the MD_Idenfier block. I'll try that visually.
{aggregateInformation {citation required} or {aggregateInformation {identifier {citation optional, code required} required}

The way we code the JSON for associatedResource(s) is to provide the full citation and also provide an identifier in the responsibleParty block. So to get the identifier attached to the resource, we must have a contact defined as citedResponsibleParty. When the responsibleParty is passed to class_identifier it dumps the provided citation info and inserts the skimpy citation built from contact organizationName.

At the very least, to make all this work I will need to code special circumstance versions of class_identifier for associatedResource, geographicDescription, and citation. Unless we change our JSON schema.

jlblcc commented 10 years ago

How about using Citation instead of responsible party in the JSON. Instead of:

"assignedId": [
                  {
                    "contactId": "",
                    "role": "",
                    "resourceIdentifier": [
                      {
                        "identifierName": "",
                        "identifier": ""
                      }
                    ]
                  }
                ]

Do:

"assignedId": [
   {
      "title":"", <=required
      "date":[
         {
            "date":"0000-00-00",
            "dateType":""
         }
      ],
      "edition":"",
      "responsibleParty":[
         {
            "contactId":"",
            "role":""
         }
      ],
      "presentationForm":[
         ""
      ],
      "additionalIdentifier":{
         "doi":"",
         "isbn":"",
         "issn":""
      },
      "resourceIdentifier":[""] //or just "" <=required in assignedId context
   }
]

We could also move resourceIdentifier into additionalIdentifier. I believe this is close to what we originally laid out before stripping the citation down. I think this will also solve the AssociatedResource issue.

I believe this is correct: aggregateDataSetName => {aggregateInformation {citation required} and/or aggregateDataSetIdentifier => {aggregateInformation {identifier {citation optional, code required} required}

I do not support a separate authority array. Besides being a pain to generate, I would like to keep the identifier in the GeoJSON to facilitate identifier availability when mapping objects. I'm not nearly as concerned with the XML verbosity as with keeping the JSON user-friendly.

stansmith907 commented 10 years ago

That's close. In data modeling terms our problem has been relating identifiers to the contact when they identify the resource (citation) not the responsible party (contact).

I think the above solution works. I still would like to consider cardinality and reuse of authorities.

Cardinality: we still haven't answered the question whether a single authority would assign multiple identifiers to a unique resource. I think not, so lean toward "" rather than []; but I defer to Allison for the final word.

Reuse: as an example; if we plan to support assigning identifiers to borehole point locations issued by GTN-P the GTN-P authority (citation) would need to be repeated for each borehole location in the above example. Moving authority (or possibly even citation) to an array would shorten the JSON notation and save errors. However, resourceIdentifier, associationType, and initiativeType could not be included in the array if we want to get full reuse of authorities.

jlblcc commented 10 years ago

We could support a defaults property at the Feature/(Geometry)Collection level. You would then only need to provide a resourceIdentifier for individual features- unless other properties differ from the default.

stansmith907 commented 10 years ago

In the case of boreholes, each borehole is issued an id by and GTN-P and another by USGS. I not sure defaults would work well to support multiple identifiers.

jlblcc commented 10 years ago

Yeah, wouldn't work in that event. I say the info should just be repeated for each borehole. Yes, it's going to make the JSON heavier. However, I see a problem trying to create an array of authorities/citations at the root level.

Contacts are fundamentally different since they exist as discrete objects in most implementations - no context is necessary when I generate one from my database and I don't have to generate unique ids for the JSON since they are maintained in the data store(database). This isn't true for the authority- at least in my mind. In this instance, the authority may only exist within the context of the project/feature and may not have an id/key in the data store. If not, temporary keys would have to be created and maintained when generating the JSON - and would only exist in the JSON.

It would be much easier to write this out in the appropriate context, which unfortunately mean repeating in the JSON(which is why we stripped citation down to begin with). Plus, moving this to a root-level array creates a chain of look-ups - you need to look-up the authority by id, then the contact(responsibleParty) by id to parse the authority block.

I hope some of that makes sense to someone else.

stansmith907 commented 10 years ago

It all makes sense. I guess what we do depends on what we consider the bigger of two weevils.

I do think authorities, and citations, stand on their own. USGS publication warehouse even has a web service that well return citations for a publication. Through most of our JSON there is rarely a case for a case for reusing a citation. That's why we rejected Ted's suggestion we also put citations in an array at root. But attaching an authority to identifier really changes the game in my mind. I think we need the shorthand in geographicDescription and could benefit from it other identifiers.

A compromise may be in order...

The reset is just coding.

nunatech commented 10 years ago

I'm not sure that I follow all of the ramifications here but I cannot think of a reason why "a single authority would assign multiple identifiers to a unique resource." I do see examples where multiple authorities assign their own IDs (ie. GTN-P and USGS have different IDs for the same borehole.)

stansmith907 commented 10 years ago

Taking in all the comments and reviewing the models here's a proposal...

Our ISO model and JSON schema support identifiers in three places:

MD_Identifier has an authority (which is a citation) and an identifier (string). Looking at all the possible and required fields of citation I think

That would cause the following changes to the schema...

For citation in resource info...

"resourceInfo": {
    "citation": {
        "title": "",
        "date": [
            {
                "date": "0000-00-00",
                "dateType": ""
            }
        ],
        "edition": "",
        "responsibleParty": [
            {
                "contactId": "",
                "role": ""
            }
        ],
        "resourceIdentifier": [
            {
                "identifierName": "",
                "identifier": "",
                "onlineResource": {
                    "uri": "http://thisisanexample.com",
                    "protocol": "",
                    "name": "",
                    "description": "",
                    "function": "",
                    "doi": ""
                }
            }
        ],
        "presentationForm": [""],
        "additionalIdentifier": {
            "doi": "",
            "isbn": "",
            "issn": ""
        },
        "onlineResource": [
            {
                "uri": "http://thisisanexample.com",
                "protocol": "",
                "name": "",
                "description": "",
                "function": "",
                "doi": ""
            }
        ]
    }
}

for GeoJSON properties

"properties": {
    "featureName": "",
    "description": "",
    "includesData": true,
    "temporalElement": {},
    "verticalElement": [],
    "assignedId": [
        {
            "identifierName": "",
            "identifier": "",
            "onlineResource": {
                "uri": "http://thisisanexample.com",
                "protocol": "",
                "name": "",
                "description": "",
                "function": "",
                "doi": ""
            }
        }
    ],
    "featureScope": "",
    "featureAcquisitionMethod": ""
}

for associatedResource

"associatedResource": [
    {
        "associationType": "",
        "resourceType": "",
        "resourceCitation": {
            "title": "",
            "date": [
                {
                    "date": "0000-00-00",
                    "dateType": ""
                }
            ],
            "edition": "",
            "responsibleParty": [
                {
                    "contactId": "",
                    "role": ""
                }
            ],
            "resourceIdentifier": [
                {
                    "identifierName": "",
                    "identifier": "",
                    "onlineResource": {
                        "uri": "http://thisisanexample.com",
                        "protocol": "",
                        "name": "",
                        "description": "",
                        "function": "",
                        "doi": ""
                    }
                }
            ],
            "presentationForm": [""],
            "additionalIdentifier": {
                "doi": "",
                "isbn": "",
                "issn": ""
            },
            "onlineResource": [
                {
                    "uri": "http://thisisanexample.com",
                    "protocol": "",
                    "name": "",
                    "description": "",
                    "function": "",
                    "doi": ""
                }
            ]
        }
    }
]
dwalt commented 10 years ago

As far as I can tell, it looks like it would work.

On Tue, May 27, 2014 at 11:41 AM, stansmith907 notifications@github.comwrote:

Taking in all the comments and reviewing the models here's a proposal...

Our ISO model and JSON schema support identifiers in three places:

  • CI_Citation (only in the full citation for the resource)
  • EX_GeographicDescription (from identifiers associated with geographic elements through GeoJSON properties)
  • MD_AggregateInformation (MD_AssociatedResource in -1 and ADIwg JSON)

MD_Identifier has an authority (which is a citation) and an identifier (string). Looking at all the possible and required fields of citation I think

  • the title should be required;
  • date should be nilReason,
  • and onlineResource optional.

That would cause the following changes to the schema...

For citation in resource info...

"resourceInfo": { "citation": { "title": "",

    "date": [
        {
            "date": "0000-00-00",
            "dateType": ""
        }
    ],
    "edition": "",
    "responsibleParty": [
        {
            "contactId": "",
            "role": ""
        }
    ],

    "resourceIdentifier": [
        {
            "identifierName": "",
            "identifier": "",
            "onlineResource": {
                "uri": "http://thisisanexample.com",
                "protocol": "",
                "name": "",
                "description": "",
                "function": "",
                "doi": ""

            }
        }
    ],
    "presentationForm": [""],
    "additionalIdentifier": {
        "doi": "",
        "isbn": "",
        "issn": ""
    },

    "onlineResource": [
        {
            "uri": "http://thisisanexample.com",
            "protocol": "",
            "name": "",
            "description": "",
            "function": "",
            "doi": ""
        }
    ]
}}

for GeoJSON properties

"properties": { "featureName": "", "description": "", "includesData": true, "temporalElement": {}, "verticalElement": [], "assignedId": [ { "identifierName": "", "identifier": "", "onlineResource": { "uri": "http://thisisanexample.com", "protocol": "", "name": "", "description": "", "function": "", "doi": "" } } ], "featureScope": "", "featureAcquisitionMethod": ""}

for associatedResource

"associatedResource": [ { "associationType": "", "resourceType": "", "resourceCitation": { "title": "",

        "date": [
            {
                "date": "0000-00-00",
                "dateType": ""
            }
        ],
        "edition": "",
        "responsibleParty": [
            {
                "contactId": "",
                "role": ""
            }
        ],

        "resourceIdentifier": [
            {
                "identifierName": "",
                "identifier": "",
                "onlineResource": {
                    "uri": "http://thisisanexample.com",
                    "protocol": "",
                    "name": "",
                    "description": "",
                    "function": "",
                    "doi": ""

                }
            }
        ],
        "presentationForm": [""],
        "additionalIdentifier": {
            "doi": "",
            "isbn": "",
            "issn": ""
        },

        "onlineResource": [
            {
                "uri": "http://thisisanexample.com",
                "protocol": "",
                "name": "",
                "description": "",
                "function": "",
                "doi": ""
            }
        ]
    }
}]

— Reply to this email directly or view it on GitHubhttps://github.com/adiwg/adiwg-json-schemas/issues/11#issuecomment-44324738 .

jlblcc commented 10 years ago

It appears that the ability to associate an identifier with a specific contact is not possible with the suggested changes. Is that an intentional outcome? If so, I don't think that removing support for that is a good idea.

stansmith907 commented 10 years ago

I was thinking that associating an identifier with a contact WAS the problem. The identifier is not for the contact but is an identifier for the resource. Going through contact to find a resource was causing problems. If we want a contact for the who assigned the identifier we can add contactID to the block.

jlblcc commented 10 years ago

So why not just use a citation for additionalIdentifier(s)?

stansmith907 commented 10 years ago

In a way, we are. But too big to repeat for each geometry. Using citation would also necessitate more levels to track identifier separate from authority. I'm was just trying to make the JSON more manageable rather than mimicking ISO. Just using citation also opens the door for recursion problems as citation has an identifier and identifier has a citation. Using the full citation would look like the following and similar for the properties and associated resources ...

"resourceInfo": {
    "citation": {
        "title": "",
        "date": [
            {
                "date": "0000-00-00",
                "dateType": ""
            }
        ],
        "edition": "",
        "responsibleParty": [
            {
                "contactId": "",
                "role": ""
            }
        ],
        "resourceIdentifier": [
            {
                "citation": {
                    "title": "",
                    "date": [
                        {
                            "date": "0000-00-00",
                            "dateType": ""
                        }
                    ],
                    "edition": "",
                    "responsibleParty": [
                        {
                            "contactId": "",
                            "role": ""
                        }
                    ],
                    "resourceIdentifier": [
                        {
                            "citation": {"..."},
                            "identifier": ""
                        }
                    ],
                    "presentationForm": [""],
                    "additionalIdentifier": {
                        "doi": "",
                        "isbn": "",
                        "issn": ""
                    },
                    "onlineResource": [
                        {
                            "uri": "http://thisisanexample.com",
                            "protocol": "",
                            "name": "",
                            "description": "",
                            "function": "",
                            "doi": ""
                        }
                    ]
                },
                "identifier": ""
            }
        ],
        "presentationForm": [""],
        "additionalIdentifier": {
            "doi": "",
            "isbn": "",
            "issn": ""
        },
        "onlineResource": [
            {
                "uri": "http://thisisanexample.com",
                "protocol": "",
                "name": "",
                "description": "",
                "function": "",
                "doi": ""
            }
        ]
    }
}
jlblcc commented 10 years ago

More like this:

{
   "resourceInfo":{
      "citation":{
         "title":"",
         "date":[
            {
               "date":"0000-00-00",
               "dateType":""
            }
         ],
         "edition":"",
         "responsibleParty":[
            {
               "contactId":"",
               "role":""
            }
         ],
         "presentationForm":[
            ""
         ],
         "additionalIdentifier":{
            "doi":"",
            "isbn":"",
            "issn":""
         },
         "onlineResource":[
            {
               "uri":"http://thisisanexample.com",
               "protocol":"",
               "name":"",
               "description":"",
               "function":"",
               "doi":""
            }
         ]
      },
      "resourceIdentifier":[
         {
            "title":"",
            "date":[
               {
                  "date":"0000-00-00",
                  "dateType":""
               }
            ],
            "responsibleParty":[
               {
                  "contactId":"",
                  "role":""
               }
            ],
            "onlineResource":[
               {
                  "uri":"http://thisisanexample.com",
                  "protocol":"",
                  "name":"",
                  "description":"",
                  "function":"",
                  "doi":""
               }
            ],
            "identifier":""
         }
      ]
   }
}

I removed presentationForm, additionalIdentifier, and edition from the citation block. It could be further restricted to one responsibleParty and one onlineResource - although that would deviate from the citation class. Also, it make more sense to move resourceIdentifier so it's a direct property of resourceInfo.

stansmith907 commented 10 years ago

I worked the above schema into the full example, reader, and internal object. It all seems to be working, code is cleaner, and schema readable - overall this seems to be an improvement. However, I have run into several issues inside associatedResource. associatedResource has a resourceCitation {} and a resourceIdentifier [] which is a shortened citation plus an identifier.

This matches the ISO schema which only accepts one or the other, not both. I don't know that the schema will need to change for this, but we will need to set a priority for the writer. I suggest the resourceIdentifier take precedence over the resourceCitation. (Yes/No?) And before you say use both by coding two MD_AggregateInformation records, remember associatedResource is an array itself.

Another issue is that resoruceIdentifier is an array. If we assume each instance of resourceIdentifier is valid this would be poor form (and risky) since the parent, associatedResource, is also an array and carries additional attributes for the class. If we do allow the user to use the array feature of resourceIdentifier under associatedResource we will need to assume that associationType and resourceType are the same for each instance of resourceIdentifier. (Yes/No?)

Or we only allow one instance of resourceIdentifier under associatedResource. This would be easy enough in the reader. resourceIdentifier will need to remain an array under resourceInfo and assignedId. (Yes/No?)

jlblcc commented 10 years ago

Are you sure MD_AggregateInformation cannot have both gmd:aggregateDataSetIdentifier and gmd:aggregateDataSetName? It looks like it can to me, see here and here. By the way, in the docs I have, it looks like they both become citations in 19115-1(name and metadataReference).

It appears that you can have both, but only one of each. I'm fine with changing resourceIdentifier to a single object under associatedResource.

dwalt commented 10 years ago

I think it is fine to assume one resource id for the resource.

On Thu, May 29, 2014 at 10:41 AM, Josh Bradley notifications@github.com wrote:

Are you sure MD_AggregateInformation cannot have both gmd:aggregateDataSetIdentifier and gmd:aggregateDataSetName? It looks like it can to me, see here http://www.schemacentral.com/sc/niem21/e-gmd_MD_AggregateInformation.html and here https://geo-ide.noaa.gov/wiki/index.php?title=ISO_AggregationInformation. By the way, in the docs I have, it looks like they both become citations in 19115-1(name and metadataReference).

It appears that you can have both, but only one of each. I'm fine with changing resourceIdentifier to a single object under associatedResource.

— Reply to this email directly or view it on GitHub https://github.com/adiwg/adiwg-json-schemas/issues/11#issuecomment-44568279 .

stansmith907 commented 10 years ago

Looking back at the ISO documentation I see I have an error in the model. Josh is right, we can have both a citation and an MD_Identifier (which has its own citation). In the NOAA examples, if you used the named resource citation, the identifier citation was suppressed. The named citation relates to the associatedResource and the resourceIdentifier citation relates to the issuing authority of the identifier; which in many cases would be the same.

Looking ahead to -1, the resourceIdentifier is dropped for associatedResource. I think I favor moving that direction now. Drop resourceIdentifier for associatedResource. Not sure what the value really is to carry identifiers for associated resources. It would mean that the named citation would become required.

jlblcc commented 10 years ago

You can still have identifiers in -1, they're just embedded in the citation(name). Do we want to add metadataReference to associatedResource?

On Thu, May 29, 2014 at 2:46 PM, stansmith907 notifications@github.com wrote:

Looking back at the ISO documentation I see I have an error in the model. Josh is right, we can have both a citation and an MD_Identifier (which has its own citation). In the NOAA examples, if you used the named resource citation, the identifier citation was suppressed. The named citation relates to the associatedResource and the resourceIdentifier citation relates to the issuing authority of the identifier; which in many cases would be the same.

Looking ahead to -1, the resourceIdentifier is dropped for associatedResource. I think I favor moving that direction now. Drop resourceIdentifier for associatedResource. Not sure what the value really is to carry identifiers for associated resources.

— Reply to this email directly or view it on GitHub https://github.com/adiwg/adiwg-json-schemas/issues/11#issuecomment-44595506 .

stansmith907 commented 10 years ago

Right again. Identifiers can be embedded in citation rather than carried separately as in AggregatedInformation. Interesting, if we leave the schema as it is now it will be the same structure as for the primary resource; resourceInfo {citation {}, resourceIdentifier []} and associatedResource {resourceCitation {}, resourceIdentifier []}. It would be easy to break with AggregatedInformation style by dropping aggregateDataSetIdentifier as suggested above but keep resourceIdentifier and move identifiers inside the resourceCitation. The array nature of resourceIdentifier could stand. But again, resourceCitation would need to be required in the JSON schema. I think I like this even better and it could stand up when -2 is supported.

jlblcc commented 10 years ago

Do we want to add metadataReference to associatedResource? -1 allows for a name(citation) and metadataReference.

On Thu, May 29, 2014 at 3:07 PM, stansmith907 notifications@github.com wrote:

Right again. Identifiers can be embedded in citation rather than carried separately as in AggregatedInformation. Interesting, if we leave the schema as it is now it will be the same structure as for the primary resource; resourceInfo {citation {}, resourceIdentifier []} and associatedResource {resourceCitation {}, resourceIdentifier []}. It would be easy to break with AggregatedInformation style by dropping aggregateDataSetIdentifier as suggested above but keep resourceIdentifier and move identifiers inside the resourceCitation. The array nature of resourceIdentifier could stand. But again, resourceCitation would need to be required in the JSON schema. I think I like this even better and it could stand up when -2 is supported.

— Reply to this email directly or view it on GitHub https://github.com/adiwg/adiwg-json-schemas/issues/11#issuecomment-44597083 .

stansmith907 commented 10 years ago

So are you agreeing to leave the JSON schema as it is and move the resourceIdentifier(s) into the citation? Dropping support for aggregateDataSetIndetifier?

I have no problem with adding metadataReference now, but it would only live in the JSON schema. Depends on how important it is to provide that option before -1.

nunatech commented 10 years ago

If it's not a big deal to include a "placeholder" for this in the JSON I would do that so we are ready to expose the info at -1. If it's a big pain, I would wait and note this on the list of enhancements to accommodate for -1.

jlblcc commented 10 years ago

Since metadataReference would be how we link project <=> data metadata, I think it's important to include. I think the name Citation should point to the actual dataset(product).

So are you agreeing to leave the JSON schema as it is and move the resourceIdentifier(s) into the citation? Dropping support for aggregateDataSetIndetifier?

Yes.

stansmith907 commented 10 years ago

I agree with all that. If we want to add metadataReference it will be another citation block inside associatedResource.

    "associatedResource": [
        {
            "associationType": "",
            "resourceType": "",
            "resourceCitation": {
                "title": "",
                "date": [
                    {
                        "date": "0000-00-00",
                        "dateType": ""
                    }
                ],
                "edition": "",
                "responsibleParty": [
                    {
                        "contactId": "",
                        "role": ""
                    }
                ],
                "presentationForm": [""],
                "additionalIdentifier": {
                    "doi": "",
                    "isbn": "",
                    "issn": ""
                },
                "onlineResource": [
                    {
                        "uri": "http://thisisanexample.com",
                        "protocol": "",
                        "name": "",
                        "description": "",
                        "function": "",
                        "doi": ""
                    }
                ]
            },
            "resourceIdentifier": [
                {
                    "title": "",
                    "date": [
                        {
                            "date": "0000-00-00",
                            "dateType": ""
                        }
                    ],
                    "responsibleParty": [
                        {
                            "contactId": "",
                            "role": ""
                        }
                    ],
                    "onlineResource": [
                        {
                            "uri": "http://thisisanexample.com",
                            "protocol": "",
                            "name": "",
                            "description": "",
                            "function": "",
                            "doi": ""
                        }
                    ],
                    "identifier": ""
                }
            ],
            "metadataCitation": {
                "title": "",
                "date": [
                    {
                        "date": "0000-00-00",
                        "dateType": ""
                    }
                ],
                "edition": "",
                "responsibleParty": [
                    {
                        "contactId": "",
                        "role": ""
                    }
                ],
                "presentationForm": [""],
                "additionalIdentifier": {
                    "doi": "",
                    "isbn": "",
                    "issn": ""
                },
                "onlineResource": [
                    {
                        "uri": "http://thisisanexample.com",
                        "protocol": "",
                        "name": "",
                        "description": "",
                        "function": "",
                        "doi": ""
                    }
                ]
            } 
        }
    ]

That's a lot for an associated resource!