david-allison / manx-corpus-search

MIT License
3 stars 1 forks source link

Handle Newspaper sources #54

Closed david-allison closed 3 years ago

david-allison commented 3 years ago

https://www.imuseum.im/Olive/APA/IsleofMan/get.res?id=page.Scripts&kind=script&uq=20210325071104&for=%7E%2Fdefault.aspx&mode=group

version:'2.7.102.0',appVersion:'5.2.6',uq:'20210325071104',viewPointVersion:'4.28.18017.37845',lastModificationDateString:'2021-03-25-19-11-04'

COM:"Calf of Man Bird Observatory Report",
CEC:"Camp Echo",
CHU:"Camp Humor",
KCZ:"Camp Zeitung",
CTG:"Castletown Gazette",
DSC:"Das Schleierlicht",
MGN:"German Gymnastics Association",
GFL:"Green Final",
HNS:"Holiday News",
IDT:"Isle of Man Daily Times",
IME:"Isle of Man Examiner",
IMT:"Isle of Man Times",
WAC:"Isle of Man Weekly Advertising Circular",
IWG:"Isle of Man Weekly Gazette",
JMM:"Journal of The Manx Museum",
LAE:"Lager Echo",DLA:"Lager Laterne",
LAZ:"Lager Zeitung",LAU:"Lager-Ulk",
MNA:"Manks Advertiser",
MNM:"Manks Mercury",
TMC:"Manx Cat",
TFP:"Manx Free Press",
MNB:"Manx Liberal",
MMNT:"Manx Museum and National Trust Report",
MNP:"Manx Patriot",
MRS:"Manx Rising Sun",
MNS:"Manx Star",
TMS:"Manx Sun",
TMN:"Manxman",
MDP:"Mona Daily Programme",
MNH:"Mona's Herald",
PCG:"Peel City Guardian",
PSL:"Peel Sentinel",
QUT:"Quousque Tandem",
RCE:"Ramsey Chronicle",
RYC:"Ramsey Courier",
RWN:"Ramsey Weekly News",
TRS:"Rising Sun",
TTS:"TT Special",
UNU:"Unter Uns",
WER:"Werden"

A newspaper image seems to be in the format: https://www.imuseum.im/Olive/APA/IsleofMan/get/image.ashx?kind=block&href=MNH%2F1833%2F10%2F25&id=Ar0010000&ext=.png &id=Ar0080001&ext=.png

Where MNH%2F1833%2F10%2F25 = MNH/1833/10/25

==Problems (currently)==

Unknown if these are solvable

david-allison commented 3 years ago

Shot off an email to MNH.

Likely best to add attribution on the top of the page, and a direct link to the image for now.

I think it'll be unlikely that we can link to the interactive page (given the structure of the site, it seems to be JS-based).

david-allison commented 3 years ago

Direct link to an image won't work.

In the newspaper vocabulary, a "component" is made of multiple "chunks". We need a link to the component

I've reverse engineered the shareKey parameter, but we don't need this (thank goodness)

A link such as: https://www.imuseum.im/Olive/APA/IsleofMan/get/article.ashx?href=IMT%2F1872%2F12%2F28&id=Ar00508&mode=image&ts=20100708065431 will work

david-allison commented 3 years ago

Fixed in #59