Open akash1810 opened 5 years ago
Additionally if you're running a long-lived process, such as a server, the state of the XMPSchemaRegistry
impacts all images as the XMPMetaFactory
is a singleton.
That is, if the first image to be ingested was the cech.jpg
attached, any following images that use the Getty schema and the GettyImagesGIFT
prefix, will appear as prefix0:
, which is incorrect. This is noted (with tests) in the Grid PR.
When an image has multiple
xpacket
s, the schema namespace prefixes are not obeyed when the same schema is used.The attached image is a cropped version of an original image supplied by Getty. The image has three
xpacket
s (attached).In the first packet, the last
rdf:Description
namespaces thehttp://xmp.gettyimages.com/gift/1.0/
schema toprefix0
the second and third packets use theGettyImagesGIFT
namespace for the same schema.When running the image through
ImageMetadataReader
theAssetId
in the last packet comes out asprefix0:AssetId
rather thanGettyImagesGIFT:AssetId
as written in the packet.It looks like the XMPMetaFactory* is keeping a cache of previously seen namespaces and is causing this issue. That is, after parsing the first packet,
XMPMetaFactory.schema
has an entry of'prefix0' -> 'http://xmp.gettyimages.com/gift/1.0/'
. When the second packet is parsed, although it's using a different namespace for thehttp://xmp.gettyimages.com/gift/1.0/
schema (GettyImagesGIFT
) it is read asprefix0
, similarly for the third packet.One possible fix for this issue would be to call
.reset()
at the start of each extraction, however I'm not sure what impact that'll have on performance and wouldn't be safe for parallel processing.Another option could be to seed the
XMPSchemaRegistry
with known namespaces upfront, before reading any files. This is what ExifTool does and what we've started doing in Grid.What do you think?
*An official repo for Adobe xmpcore isn't available on GitHub, so linking to your copy.
Assets
cech-xpacket-1.txt cech-xpacket-2.txt cech-xpacket-3.txt