NationalMuseumAustralia / Collection-API

The public web API of the National Museum of Australia
11 stars 0 forks source link

Images in narratives #104

Closed f27wood closed 6 years ago

f27wood commented 6 years ago

I am wanting to send the Prime Ministers set to MOAD for inclusion on the PM website, so was testing the the Set and checking all of the images displayed when using the internal API, and found some inconsistent results!

This was my test data:

Public https://data.nma.gov.au/narrative/3167

Internal https://data.nma.gov.au/narrative/3167?apikey=XXX

I found what images were and weren't included was inconsistent as follows, and often incorrect.

Public Domain CC status=Public Domain Images should be in both 53984, In neither = INCORRECT 119554, in both = CORRECT

Commercial and non-commercial CC status=CC BY-SA 4.0. Images should be in both 36690, 71102, 112768, Images not in either=INCORRECT 130681, 116726, images in both = CORRECT

Non-commercial CC status=CC BY-NC-SA 4.0. Images should be in both 73408, Images not in either = INCORRECT

Yet to be determined CC status= Should be in internal but not public 53983, 136468, In internal but not public = CORRECT 230321, In neither= INCORRECT

Restricted CC status=All Rights Reserved Should be in internal but not public 58480, 56892, in neither = INCORRECT

I did a quick test of other narratives and it is also an issue with them. Can do some more testing and provide mroe examples if you wish.

I suspect that when we tested this, we didn't do a thorough job and relied on a few examples being correct. Or something has changed...

Conal-Tuohy commented 6 years ago

To summarise what you've described: some images which should be there are missing, but there don't appear to be any errors of the opposite kind (images present which should be missing).

My investigation shows that the images do appear in the JSON-LD format, and they do appear in the records of the individual objects themselves, when accessed directly via the API; they are missing from the objects's descriptions only when those objects are accessed as part of a narrative.

e.g.

The bug must therefore be in the stylesheet which renders RDF as simple JSON; in the case where the root resource of the RDF description is a narrative, it sometimes fails to render the representations of the physical objects which are aggregated in that narrative.

It remains to be seen why the representations sometimes are rendered.

Conal-Tuohy commented 6 years ago

I believe the issue is that the missing images are those which are not tagged as 'preferred' images. In narratives, we decided not to include all the images of all the objects; instead we include only those images which are 'preferred'. This was to make the size of narrative records more tractable for clients (they can otherwise be very large). For some reason, this filtering is done in the trix-description-to-dc.xsl step (i.e. it is not a feature of the json-ld descriptions, which do include all images whether 'preferred' or not). It seems to me that this redaction feature should be moved into a separate redaction step, like all the other redaction steps; it should not be specific to the simple json serialization.

In any case, for some reason, some objects don't have a 'preferred' image. I wonder why?

Conal-Tuohy commented 6 years ago

OK here's the issue; there are physical objects which have Piction images, none of which are 'preferred'. Our criterion for tagging a Piction image as 'preferred' is that it contains <field name='Page Number'>1</field>, but it seems this is not a good criterion. e.g. in the Piction XML there are two records for images which related to object 53984, neither of which contain <field name='Page Number'>1</field>, and hence neither of which are tagged as 'preferred' images.

<doc>
    <field name="EMu IRN for Related Objects">53984</field>
    <field name="Multimedia ID">MA45787951</field>
    <field name="Title">Envelope titled 'United for Victory Interlocking Jigsaw'</field>
    <dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderivT\DAMS_INGEST\JOBS\WM_45202040\nma-45202040-039-wm-vs1.jpg" name="thumbnail"/>
    <dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderivW\DAMS_INGEST\JOBS\WM_45202040\nma-45202040-039-wm-vs1.jpg" name="web"/>
    <dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderiv2\WM_45202040\nma-45202040-039-wm-vs1_o2.jpg" name="original_2"/>
    <dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderiv3\WM_45202040\nma-45202040-039-wm-vs1_o3.jpg" name="original_3"/>
    <dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderiv4\WM_45202040\nma-45202040-039-wm-vs1_o4.jpg" name="original_4"/>
    <dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderiv5\WM_45202040\nma-45202040-039-wm-vs1_o5.jpg" name="original_5"/>
</doc>
<doc>
    <field name="EMu IRN for Related Objects">53984</field>
    <field name="Multimedia ID">MA45790088</field>
    <field name="Title">Envelope titled 'United for Victory Interlocking Jigsaw'</field>
    <dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderivT\DAMS_INGEST\JOBS\WM_45202040\nma-45202040-038-wm-vs1.jpg" name="thumbnail"/>
    <dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderivW\DAMS_INGEST\JOBS\WM_45202040\nma-45202040-038-wm-vs1.jpg" name="web"/>
    <dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderiv2\WM_45202040\nma-45202040-038-wm-vs1_o2.jpg" name="original_2"/>
    <dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderiv3\WM_45202040\nma-45202040-038-wm-vs1_o3.jpg" name="original_3"/>
    <dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderiv4\WM_45202040\nma-45202040-038-wm-vs1_o4.jpg" name="original_4"/>
    <dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderiv5\WM_45202040\nma-45202040-038-wm-vs1_o5.jpg" name="original_5"/>
</doc>

Querying the graph store, it seems there are 57020 physical objects which do have a 'preferred' image, but there are also 72120 objects which have some images, but of which none are 'preferred'.

Conal-Tuohy commented 6 years ago

What to do in the case that none of the Piction images related to a particular object are marked as preferred? We could just pick the first one. i.e. if we group the Piction images by the IRN of the object they depict, and proceeding through each group, if we find that none of the members of a group have Page Number=1, then we could assign Page Number=1 to the first image in the group.

Conal-Tuohy commented 6 years ago

I have checked the PM's narrative on the dev server, and all the objects listed above now do have the images they should have.

f27wood commented 6 years ago

Thanks Conal, yes I think you will have to take the first one and make that the preferred image. I will follow up from our end about trying to always have a Page number=1 if there are more than one image assigned to an object. And we should include this in our doco.

f27wood commented 6 years ago

Tested an NMA test with a couple of sets, and all passed.

f27wood commented 6 years ago

I am opening this, as I prefer not to close an issue until it is working in Prod.

Conal-Tuohy commented 6 years ago

Sure ... it got closed automatically when I referred to it in my commit message. Thanks github!

Since it's worked out OK on csapi-test I have deployed to production. We should see some result tomorrow, and be able to close the issue then.

f27wood commented 6 years ago

tested and passed in prod