american-art / ima

Indianapolis Museum of Art
Other
6 stars 3 forks source link

Dimensions are strange #18

Closed workergnome closed 7 years ago

workergnome commented 7 years ago

See this: http://data.americanartcollaborative.org/page/ima/object/id/10027

There appear to be a huge number of dimensions, many without content, and all with nonsensical URLs.

bsnikhila commented 7 years ago

If we see the data, most of the Values for PhyType_tab are like "Image Dimensions", or "Sheet Dimensions". So while mapping, the URIs have been created as value_of_ObjectURI/type_of_dimension/value_of_PhyType_tab. A few records seem to have the value of PhyType_tab as "width" or "height" resulting in the dimension URIs like http://data.americanartcollaborative.org/ima/object/10027/depth/depth and http://data.americanartcollaborative.org/ima/object/10027/width/depth

workergnome commented 7 years ago

Interesting. Not sure who the IMA contact is now, but this would be a question for them....

hlfloyd commented 7 years ago

Good morning, I am new to this project, but will be the IMA contact. I am not quite sure what the question is yet, or what I need to do - is this a "resubmit data" type issue, or do I just need to look in the source system and see if it looks crazy there too, or something else? Thanks for you patience while I learn about this. -Heather

workergnome commented 7 years ago

So, for object http://data.americanartcollaborative.org/page/ima/object/10027, what appears in your source data for dimensions? We're seeing something strange, but I'm not sure if it's in the JSON export, in the CMS, or a mapping error..

Typically, we see something that looks like "overall/depth", but for this one we're seeing "depth/depth". It probably occurs when there isn't an explicit part given for the dimensions...

hlfloyd commented 7 years ago

It is a problem in the CMS - three entries have been made for the dimensions. The IMA online collection doesn't know what to do with it either, so omits the dimensions display.

Also, the actors json element for this object seems incorrect - there is an artist and a printer, both with the same id, but there are two separate people involved (John Cage and Irwin Hollander):

"actors": [
    {
        "id": "1982",
        "role": "Artist"
    },
    {
        "id": "1982",
        "role": "Printer"
    }
],

And, if I am interpreting the site correctly, only Hollander is considered to be the producer http://data.americanartcollaborative.org/page/ima/object/10027/production

bsnikhila commented 7 years ago

Only those with role as "Artist" have been mapped as the corresponding object's creators.

SamiNorling commented 7 years ago

This is my first week as Digital Collections Manager at the IMA Lab, and I will be working on our data set for AAC. As Heather said, this problem originates in our CMS. I will be cleaning up our data, both there and for a data resubmit once I review all of our issues (my understanding is that we are in a data review period and have the opportunity to do some clean-up and clarification on mappings, but please let me know if that is not the case).

For this issue, there are a couple of mapping options once I get the data cleaned up, and it will depend on how granular AAC's target data model gets with dimensions. Using this item as an example, once I bring the height, width, and depth into the same entry in our CMS as they are supposed to be (which also includes weight where relevant), and populate "converted_dimensions" with the concatenated dimensions, that part of the json record for this item would be:

        "subject_identification": null,
        "subject_description": null,
        "converted_dimensions": 14 x 20 x 14-1/2 in.,
        "dimensions": [
            {
                "PhyWeight_tab": null,
                "PhyHeight_tab": 14,
                "PhyType_tab": "dimensions",
                "PhyDiameter_tab": null,
                "PhyDimensionNotes_tab": "Metric: 35.55999947 cm height",
                "PhyUnitWeight_tab": null,
                "PhyWidth_tab": 20,
                "PhyDepth_tab": 14.5,
                "PhyUnitLength_tab": "in"
            },
        ]

With the records cleaned up this way, the URIs could be created using the values of the separate measurement fields, and PhyType_tab would not be used, since it would be a generic value as noted, such as "dimensions," "sheet dimensions," etc.

value_of_ObjectURI/height/value_of_PhyHeight_tab value_of_ObjectURI/width/value_of_PhyWidth_tab value_of_ObjectURI/depth/value_of_PhyDepth_tab value_of_ObjectURI/weight/value_of_PhyWeight_tab

The other option would be to use the "converted_dimensions" value and map to a blanket dimensions field, but I am assuming AAC prefers granularity in dimensions to avoid ambiguity and inconsistency in how that concatenated field would present the dimensions/weight, which is bound to happen.

This issue will be consistently fixed in all records when we return a cleaned data set, so all records would then only have one dimensions entry like the example above. If you have any guidance/preference for how we clean that up, or if the above example looks good, that will be added to be list of clean-ups before resubmitting.

Sami

SamiNorling commented 7 years ago

I was a bit unclear in my previous comment. Each measurement group will have only one "dimensions:" entry. So, for example, a record may have "Unframed Dimensions" and "Framed Dimensions," but the height, width, and depth for those measurement groups would all be together in one "dimensions:" entry. The name of the measurement group would be the value of "PhyType_tab." Where there is only one set of dimensions, the value of "PhyType_tab" would be just a generic "dimensions," as in the previous example.

Is this set-up something you can model?

caknoblock commented 7 years ago

We can model it as a dimension string, but I don’t think this is what David wants for the browse application. Much better if you can separate the different dimensions.

On Jun 15, 2017, at 10:46 AM, Sami Norling notifications@github.com wrote:

I was a bit unclear in my previous comment. Each measurement group will have only one "dimensions:" entry. So, for example, a record may have "Unframed Dimensions" and "Framed Dimensions," but the height, width, and depth for those measurement groups would all be together in one "dimensions:" entry. The name of the measurement group would be the value of "PhyType_tab." Where there is only one set of dimensions, the value of "PhyType_tab" would be just a generic "dimensions," as in the previous example.

Is this set-up something you can model?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/american-art/ima/issues/18#issuecomment-308817641, or mute the thread https://github.com/notifications/unsubscribe-auth/ABB-qTzAok4PPN98sUM60j-JjunXt48Eks5sEW4IgaJpZM4MvBzd.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/american-art/ima","title":"american-art/ima","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/american-art/ima"}},"updates":{"snippets":[{"icon":"PERSON","message":"@SamiNorling in #18: I was a bit unclear in my previous comment. Each measurement group will have only one \"dimensions:\" entry. So, for example, a record may have \"Unframed Dimensions\" and \"Framed Dimensions,\" but the height, width, and depth for those measurement groups would all be together in one \"dimensions:\" entry. The name of the measurement group would be the value of \"PhyType_tab.\" Where there is only one set of dimensions, the value of \"PhyType_tab\" would be just a generic \"dimensions,\" as in the previous example.\r\n\r\nIs this set-up something you can model?"}],"action":{"name":"View Issue","url":"https://github.com/american-art/ima/issues/18#issuecomment-308817641"}}}

workergnome commented 7 years ago

let me know, @SamiNorling, if you need any help from me on this. I think your remap will probably resolve this.

SamiNorling commented 7 years ago

@workergnome Yes, our Physical Dimensions data will be structured much more simply once I am able to clean up our inconsistent use of dimension sets in our CMS. This will make for a much cleaner mapping, with more consistent and less redundant URIs.

I suggest this issue be closed, since it applies to our previous data structure.

workergnome commented 7 years ago

Doesn't look like the new data resolved this issue. @SamiNorling, can you check it out? See, for instance, http://data.americanartcollaborative.org/page/ima/object/13698

SamiNorling commented 7 years ago

@workergnome With the duplication of data (would ISI be responsible for purging the original data from the store to remove the duplication mentioned in issue #22?), it might be a bit difficult to see what is supposed to be going on with the data.

The new format for URIs is /ima/object/[id]/[measurement name]/[dimension type]. For the item you linked to, we would have four dimensions:

http://data.americanartcollaborative.org/ima/object/13698/frameddimensions/height <http://data.americanartcollaborative.org/ima/object/13698/frameddimensions/width <http://data.americanartcollaborative.org/ima/object/13698/unframeddimensions/height <http://data.americanartcollaborative.org/ima/object/13698/unframeddimensions/width

In our data set, we often have multiple named measurements for an object (i.e. framed/unframed; sheet/image, etc.), each of which potentially has its own height, width, depth, as is the case with the item you linked to. Have you seen this scenario come up with other museum's data, possibly modeled in a different way? We would like to represent all of the different dimensions in our data set.

Having cleaned the data, we shouldn't have any dimensions that have no content, as was a problem originally, and none that have the repetitive URIs such as /ima/object/[id]/depth/depth (hopefully).

workergnome commented 7 years ago

Yes—modeled in Parts is how we've been thinking of this. If you look at http://data.americanartcollaborative.org/page/wam/object/18862, you can see how they've done it—in particular the crm:P46_is_composed_of

SamiNorling commented 7 years ago

Thanks for sharing the example--hadn't come across it during the remodeling process, but that is a logical mapping, especially now that our data is consistent and clean. I would be happy to update the IMA-Objects-Dimensions model, reapply to the dimensions data file, and replace the triples file there, but want to be sure that ISI has the capacity to replace the current dimensions in the store with the new set. Seems like it might be good timing with Issue #22 currently open, requiring a purge of our old data to remove duplication, but I will defer to @caknoblock on whether this would be disruptive. I could take care of this early Monday if I have the go-ahead.

caknoblock commented 7 years ago

Yes, please go ahead and update your model and we can load the new version.

On Oct 21, 2017, at 4:19 PM, Sami Norling notifications@github.com wrote:

Thanks for sharing the example--hadn't come across it during the remodeling process, but that is a logical mapping, especially now that our data is consistent and clean. I would be happy to update the IMA-Objects-Dimensions model, reapply to the dimensions data file, and replace the triples file there, but want to be sure that ISI has the capacity to replace the current dimensions in the store with the new set. Seems like it might be good timing with Issue #22 https://github.com/american-art/ima/issues/22 currently open, requiring a purge of our old data to remove duplication, but I will defer to @caknoblock https://github.com/caknoblock on whether this would be disruptive. I could take care of this early Monday if I have the go-ahead.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/american-art/ima/issues/18#issuecomment-338439129, or mute the thread https://github.com/notifications/unsubscribe-auth/ABB-qbhLfxIAx8__eGdR2_bnY3id8rVfks5sunwcgaJpZM4MvBzd.

SamiNorling commented 7 years ago

I updated our model files for Objects-Dimensions, and also replaced the Object Dimensions csv with a new version with data cleaned for this model. They can be found in IMA-Objects-Dimensions folder. None of our other data sets or models have changed, so only that data needs to be regenerated and added to the triplestore.

Thank you in advance for that, ISI folk (@caknoblock et. al.)

Edit: I should also note that I reviewed the mapping by running the Dimensions (Part) query against a sample set of triples generated with the updated model.

GreatYYX commented 7 years ago

new data uploaded.