Then traverse the whole file to find all elements with attribute e.g. [FILEID="ALTO00001"]. Then find their parents that has TYPE="ARTICLE" and add this as part of the overlays for that parent.
@epicfaace do you think there's any easier way? Also, do we need to highlight anything other than TYPE="ARTICLE"? (e.g. TYPE="TITLE_SECTION" and TYPE="ADVERTISEMENT")
In METS file, first traverse "Physical Structure" to get all pages (e.g.
ALTO00001
). e.g. (https://s3.amazonaws.com/stanforddailyarchive/data.2013-nov/data/stanford/1999/12/01_01/Stanford_Daily_19991201_0001-METS.xml)Then traverse the whole file to find all elements with attribute e.g.
[FILEID="ALTO00001"]
. Then find their parents that hasTYPE="ARTICLE"
and add this as part of the overlays for that parent.e.g.
It will add
P1_TB00007
andP1_TB00006
to overlay forMODSMD_ARTICLE1
.Then find corresponding positions and size in the ALTO file. (e.g. https://s3.amazonaws.com/stanforddailyarchive/data.2013-nov/data/stanford/1999/12/01_01/Stanford_Daily-ALTO/Stanford_Daily_19991201_0001_ALTO0001.xml) https://github.com/TheStanfordDaily/archives-web/blob/8a48f383e4a239d6bec7dd98e77a175c0e2b02fb/src/classes/Page.js#L23-L43
@epicfaace do you think there's any easier way? Also, do we need to highlight anything other than
TYPE="ARTICLE"
? (e.g.TYPE="TITLE_SECTION"
andTYPE="ADVERTISEMENT"
)