In our current strategy we focus on plain htmlToDita conversion - largely ignoring the structure of the content.
This gives a useable document, but it could be deferring some technical debt for future parsing of the document set.
We can help this out by adding some metadata.
I can think of these tags:
[x] regions file (world map) - document level id
[x] regions file (world map) - imagemap
[x] region page - document level tag
[x] category page - document level tag
[x] category page - flag element (this is in place already)
[ ] unit page - document level tag
[ ] unit page - images block
[ ] unit page - signatures table
[ ] unit page - propulsion block
[ ] unit page - remarks block
[ ] images pages - document level tag
[ ] transducer page - document level tag
Ian to investigate pattern to detrmine images/transducer pages onsite. Suspect it's if filename contains _grams or _pics, but will need to double-check that. Or, we may need to look at some kind of page content to determine that.
@IanMayo to make sure that anchors and banjo have the correct headers/content to trigger the unit page identifiers.
In our current strategy we focus on plain htmlToDita conversion - largely ignoring the structure of the content.
This gives a useable document, but it could be deferring some technical debt for future parsing of the document set.
We can help this out by adding some metadata.
I can think of these tags:
Ian to investigate pattern to detrmine images/transducer pages onsite. Suspect it's if filename contains
_grams
or_pics
, but will need to double-check that. Or, we may need to look at some kind of page content to determine that.@IanMayo to make sure that
anchors
andbanjo
have the correct headers/content to trigger the unit page identifiers.