Closed f27wood closed 6 years ago
To summarise what you've described: some images which should be there are missing, but there don't appear to be any errors of the opposite kind (images present which should be missing).
My investigation shows that the images do appear in the JSON-LD format, and they do appear in the records of the individual objects themselves, when accessed directly via the API; they are missing from the objects's descriptions only when those objects are accessed as part of a narrative.
e.g.
hasVersion
key.representation
key of the JSON object whose id
is http://data.nma.gov.au/object/53984#
The bug must therefore be in the stylesheet which renders RDF as simple JSON; in the case where the root resource of the RDF description is a narrative, it sometimes fails to render the representations of the physical objects which are aggregated in that narrative.
It remains to be seen why the representations sometimes are rendered.
I believe the issue is that the missing images are those which are not tagged as 'preferred' images. In narratives, we decided not to include all the images of all the objects; instead we include only those images which are 'preferred'. This was to make the size of narrative records more tractable for clients (they can otherwise be very large). For some reason, this filtering is done in the trix-description-to-dc.xsl
step (i.e. it is not a feature of the json-ld descriptions, which do include all images whether 'preferred' or not). It seems to me that this redaction feature should be moved into a separate redaction step, like all the other redaction steps; it should not be specific to the simple json serialization.
In any case, for some reason, some objects don't have a 'preferred' image. I wonder why?
OK here's the issue; there are physical objects which have Piction images, none of which are 'preferred'. Our criterion for tagging a Piction image as 'preferred' is that it contains <field name='Page Number'>1</field>
, but it seems this is not a good criterion. e.g. in the Piction XML there are two records for images which related to object 53984, neither of which contain <field name='Page Number'>1</field>
, and hence neither of which are tagged as 'preferred' images.
<doc>
<field name="EMu IRN for Related Objects">53984</field>
<field name="Multimedia ID">MA45787951</field>
<field name="Title">Envelope titled 'United for Victory Interlocking Jigsaw'</field>
<dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderivT\DAMS_INGEST\JOBS\WM_45202040\nma-45202040-039-wm-vs1.jpg" name="thumbnail"/>
<dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderivW\DAMS_INGEST\JOBS\WM_45202040\nma-45202040-039-wm-vs1.jpg" name="web"/>
<dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderiv2\WM_45202040\nma-45202040-039-wm-vs1_o2.jpg" name="original_2"/>
<dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderiv3\WM_45202040\nma-45202040-039-wm-vs1_o3.jpg" name="original_3"/>
<dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderiv4\WM_45202040\nma-45202040-039-wm-vs1_o4.jpg" name="original_4"/>
<dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderiv5\WM_45202040\nma-45202040-039-wm-vs1_o5.jpg" name="original_5"/>
</doc>
<doc>
<field name="EMu IRN for Related Objects">53984</field>
<field name="Multimedia ID">MA45790088</field>
<field name="Title">Envelope titled 'United for Victory Interlocking Jigsaw'</field>
<dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderivT\DAMS_INGEST\JOBS\WM_45202040\nma-45202040-038-wm-vs1.jpg" name="thumbnail"/>
<dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderivW\DAMS_INGEST\JOBS\WM_45202040\nma-45202040-038-wm-vs1.jpg" name="web"/>
<dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderiv2\WM_45202040\nma-45202040-038-wm-vs1_o2.jpg" name="original_2"/>
<dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderiv3\WM_45202040\nma-45202040-038-wm-vs1_o3.jpg" name="original_3"/>
<dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderiv4\WM_45202040\nma-45202040-038-wm-vs1_o4.jpg" name="original_4"/>
<dataSource type="URLDataSource" baseUrl="\\nma-isilon1\dams_data\Collectionsearch\prodderiv5\WM_45202040\nma-45202040-038-wm-vs1_o5.jpg" name="original_5"/>
</doc>
Querying the graph store, it seems there are 57020 physical objects which do have a 'preferred' image, but there are also 72120 objects which have some images, but of which none are 'preferred'.
What to do in the case that none of the Piction images related to a particular object are marked as preferred? We could just pick the first one. i.e. if we group the Piction images by the IRN of the object they depict, and proceeding through each group, if we find that none of the members of a group have Page Number
=1, then we could assign Page Number
=1 to the first image in the group.
I have checked the PM's narrative on the dev server, and all the objects listed above now do have the images they should have.
Thanks Conal, yes I think you will have to take the first one and make that the preferred image. I will follow up from our end about trying to always have a Page number=1 if there are more than one image assigned to an object. And we should include this in our doco.
Tested an NMA test with a couple of sets, and all passed.
I am opening this, as I prefer not to close an issue until it is working in Prod.
Sure ... it got closed automatically when I referred to it in my commit message. Thanks github!
Since it's worked out OK on csapi-test I have deployed to production. We should see some result tomorrow, and be able to close the issue then.
tested and passed in prod
I am wanting to send the Prime Ministers set to MOAD for inclusion on the PM website, so was testing the the Set and checking all of the images displayed when using the internal API, and found some inconsistent results!
This was my test data:
Public https://data.nma.gov.au/narrative/3167
Internal https://data.nma.gov.au/narrative/3167?apikey=XXX
I found what images were and weren't included was inconsistent as follows, and often incorrect.
Public Domain CC status=Public Domain Images should be in both 53984, In neither = INCORRECT 119554, in both = CORRECT
Commercial and non-commercial CC status=CC BY-SA 4.0. Images should be in both 36690, 71102, 112768, Images not in either=INCORRECT 130681, 116726, images in both = CORRECT
Non-commercial CC status=CC BY-NC-SA 4.0. Images should be in both 73408, Images not in either = INCORRECT
Yet to be determined CC status=
Should be in internal but not public
53983, 136468, In internal but not public = CORRECT
230321, In neither= INCORRECT
Restricted CC status=All Rights Reserved Should be in internal but not public 58480, 56892, in neither = INCORRECT
I did a quick test of other narratives and it is also an issue with them. Can do some more testing and provide mroe examples if you wish.
I suspect that when we tested this, we didn't do a thorough job and relied on a few examples being correct. Or something has changed...