Open hesyifei opened 5 years ago
For example, this article is separated on two pages: https://stanforddailyarchive.com/cgi-bin/stanford?a=d&d=stanford20140106-01.2.5&e=-------en-20--1--txt-txIN-------#
But https://github.com/TheStanfordDaily/archives-text/blob/3e24b7ee6c55dac8fcff552e02119b502afd6f42/2014/01/06/MODSMD_ARTICLE4.article.txt only has the part that is on the first page.
https://github.com/TheStanfordDaily/archives-text/blob/3e24b7ee6c55dac8fcff552e02119b502afd6f42/2014/01/06/MODSMD_ARTICLE4.article.txt#L1-L53
Here's the relevant ALTO file: https://tiles.archives.stanforddaily.com/data.2014-oct/data/stanford/2014/01/06_01/Stanford_Daily-ALTO/Stanford_Daily_20140106_0001_ALTO0002.xml
For example, this article is separated on two pages: https://stanforddailyarchive.com/cgi-bin/stanford?a=d&d=stanford20140106-01.2.5&e=-------en-20--1--txt-txIN-------#
But https://github.com/TheStanfordDaily/archives-text/blob/3e24b7ee6c55dac8fcff552e02119b502afd6f42/2014/01/06/MODSMD_ARTICLE4.article.txt only has the part that is on the first page.
https://github.com/TheStanfordDaily/archives-text/blob/3e24b7ee6c55dac8fcff552e02119b502afd6f42/2014/01/06/MODSMD_ARTICLE4.article.txt#L1-L53