dijs / wiki

Wikipedia Interface for Node.js
MIT License
315 stars 61 forks source link

page.mainImage() throws error if querying in another language #51

Closed klemensz closed 6 years ago

klemensz commented 6 years ago
    TypeError: Cannot read property 'imageinfo' of undefined
    at /app/node_modules/wikijs/dist/page.js:1:3865

The problem seems to be that for example in the German version (https://de.wikipedia.org/w/api.php), the prefix is not "File:" but "Datei:", in the French version it's "Fichier:". For example:


rawImages: [ { ns: 6,
    title: 'Datei:BatmobileBurton.jpg',
    missing: '',
    known: '',
    imagerepository: 'shared',
    imageinfo: [ [Object] ] },
  { ns: 6,
    title: 'Datei:Commons-logo.svg',
    missing: '',
    known: '',
    imagerepository: 'shared',
    imageinfo: [ [Object] ] },
...

Should be possible to split by indexOf(':')?

dijs commented 6 years ago

Good find! Will fix this.

dijs commented 6 years ago

Fixed in v3.1.2

klemensz commented 6 years ago

Thanks, that was quick. It works in French now, but not yet in German. I see that using the German API, page.info() returns an empty object {}. That's probably the root cause.

dijs commented 6 years ago

Hmm... Can you make a new issue with more information for me. Code examples that don't work. And an error?

klemensz commented 6 years ago

Sure:

        wiki({ apiUrl: 'https://de.wikipedia.org/w/api.php' }).page('Batman').then(page => {
            console.log('====== Page ID: ' + page.raw.pageid);
            console.dir(page);
            page.info().then(info => console.log('info:', info));
            page.summary().then(summary => console.log('summary:', summary));
            page.categories().then(categories => console.log('categories:', categories));
            page.content().then(content => console.log('content:', content.length));
            page.images().then(images => console.log('images:', images));
            page.rawImages().then(rawImages => console.log('rawImages:', rawImages));
            page.mainImage().then(mainImage => console.log('mainImage:', mainImage));
        });

Output:

====== Page ID: 7855
{ raw: 
   { pageid: 7855,
     ns: 0,
     title: 'Batman',
     contentmodel: 'wikitext',
     pagelanguage: 'de',
     pagelanguagehtmlcode: 'de',
     pagelanguagedir: 'ltr',
     touched: '2017-07-25T19:58:06Z',
     lastrevid: 166646184,
     length: 52150,
     fullurl: 'https://de.wikipedia.org/wiki/Batman',
     editurl: 'https://de.wikipedia.org/w/index.php?title=Batman&action=edit',
     canonicalurl: 'https://de.wikipedia.org/wiki/Batman' },
  html: [Function: html],
  content: [Function: content],
  summary: [Function: summary],
  images: [Function: images],
  references: [Function: references],
  links: [Function: links],
  categories: [Function: categories],
  coordinates: [Function: coordinates],
  info: [Function: n],
  backlinks: [Function: backlinks],
  rawImages: [Function: f],
  mainImage: [Function: mainImage] }
categories: [ 'Kategorie:Batman',
  'Kategorie:Comicfigur',
  'Kategorie:Comicverfilmung',
  'Kategorie:Filmreihe',
  'Kategorie:Pseudonym',
  'Kategorie:Superheld',
  'Kategorie:Trickfigur' ]
summary: Batman (englisch für Fledermausmann) ist eine von Bob Kane erdachte und durch Bill Finger vor dem Erscheinen weiterentwickelte Comicfigur. Finger veränderte das ursprünglich steife Cape in ein wallendes und konzipierte Batman als zweite Identität des Milliardärs Bruce Wayne. Batman erschien erstmals im März 1939 in dem Comic-Magazin Detective Comics (Ausgabe 27); nach diesem Magazin nannte sich später dessen Verlag in DC Comics um und ist nun im Besitz von Time Warner.
rawImages: [ { ns: 6,
    title: 'Datei:BatmobileBurton.jpg',
    missing: '',
    known: '',
    imagerepository: 'shared',
    imageinfo: [ [Object] ] },
  { ns: 6,
    title: 'Datei:Commons-logo.svg',
    missing: '',
    known: '',
    imagerepository: 'shared',
    imageinfo: [ [Object] ] },
  { ns: 6,
    title: 'Datei:Disambig-dark.svg',
    missing: '',
    known: '',
    imagerepository: 'shared',
    imageinfo: [ [Object] ] },
  { ns: 6,
    title: 'Datei:Gotham City Saviour (2430422247).jpg',
    missing: '',
    known: '',
    imagerepository: 'shared',
    imageinfo: [ [Object] ] },
  { ns: 6,
    title: 'Datei:Kustum a laFledermaus.png',
    missing: '',
    known: '',
    imagerepository: 'shared',
    imageinfo: [ [Object] ] },
  { ns: 6,
    title: 'Datei:USD205998.png',
    missing: '',
    known: '',
    imagerepository: 'shared',
    imageinfo: [ [Object] ] },
  { ns: 6,
    title: 'Datei:Wikiquote-logo.svg',
    missing: '',
    known: '',
    imagerepository: 'shared',
    imageinfo: [ [Object] ] } ]
info: {}
content: 30986
Wed Jul 26 2017 11:42:52 GMT+0000 (UTC) ERROR Process   Unhandled Rejection at Promise [object Promise], reason:, TypeError: Cannot read property 'imageinfo' of undefined
    TypeError: Cannot read property 'imageinfo' of undefined
    at /app/node_modules/wikijs/dist/page.js:1:3965
    at tryCatcher (/app/node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (/app/node_modules/bluebird/js/release/promise.js:512:31)
    at Promise._settlePromise (/app/node_modules/bluebird/js/release/promise.js:569:18)
    at Promise._settlePromise0 (/app/node_modules/bluebird/js/release/promise.js:614:10)
    at Promise._settlePromises (/app/node_modules/bluebird/js/release/promise.js:693:18)
    at Promise._fulfill (/app/node_modules/bluebird/js/release/promise.js:638:18)
    at PromiseArray._resolve (/app/node_modules/bluebird/js/release/promise_array.js:126:19)
    at PromiseArray._promiseFulfilled (/app/node_modules/bluebird/js/release/promise_array.js:144:14)
    at Promise._settlePromise (/app/node_modules/bluebird/js/release/promise.js:574:26)
    at Promise._settlePromise0 (/app/node_modules/bluebird/js/release/promise.js:614:10)
    at Promise._settlePromises (/app/node_modules/bluebird/js/release/promise.js:693:18)
    at Async._drainQueue (/app/node_modules/bluebird/js/release/async.js:133:16)
    at Async._drainQueues (/app/node_modules/bluebird/js/release/async.js:143:10)
    at Immediate.Async.drainQueues (/app/node_modules/bluebird/js/release/async.js:17:14)
    at runCallback (timers.js:666:20)
images: [ 'https://upload.wikimedia.org/wikipedia/commons/1/13/BatmobileBurton.jpg',
  'https://upload.wikimedia.org/wikipedia/commons/4/4a/Commons-logo.svg',
  'https://upload.wikimedia.org/wikipedia/commons/e/ea/Disambig-dark.svg',
  'https://upload.wikimedia.org/wikipedia/commons/0/07/Gotham_City_Saviour_%282430422247%29.jpg',
  'https://upload.wikimedia.org/wikipedia/commons/d/df/Kustum_a_laFledermaus.png',
  'https://upload.wikimedia.org/wikipedia/commons/a/af/USD205998.png',
  'https://upload.wikimedia.org/wikipedia/commons/f/fa/Wikiquote-logo.svg' ]

Same call with fr.wikipedia.org will return

info: { charteCouleur: 'BD',
  oeuvre: 'Batman',
  nom: 'Batman',
  image: 'Batman (black background).jpg',
  nomAlias: 'Bruce Wayne (véritable identité)',
  naissance: 'Gotham',
  origine: 'Américain',
  adresse: 'Gotham',
  famille: 'Thomas Wayne',
  affiliation: 'JLA',
  ennemi: 'Ennemis de Batman',
  libre: [ 'condition physique', 'détective' ] }
dijs commented 6 years ago

Okay. So I looked into this issue.

The problem is that the German version of the Batman page does not have a Infobox. The infobox tells us which image is the "main image".

What I will do for now, until we have a better fix, is just return the first image if there is no infobox data.

dijs commented 6 years ago

Okay, check v3.1.3

klemensz commented 6 years ago

Now I get an image 👍 But even though the page has an infobox, it's not the main image. In the German version the property is called "bildname" as opposed to "image" in English and French. I check also Spanish, where it's "imagen". You can test it for example by searching for "Cristiano Ronaldo".

info: { bildname: 'Russia-Portugal CC2017 (11) (cropped).jpg',
  bildunterschrift: 'Cristiano Ronaldo (2017)',
  langname: 'Cristiano Ronaldo dos Santos Aveiro',
  geburtstag: [ '5. Februar', '1985' ],
  geburtsort: 'Funchal',
  geburtsland: 'Portugal',
  position: [ 'Flügel', 'Sturm' ],
  jugendvereineTabelle: [ 'Team-Station', 'Team-Station', 'Team-Station' ],
  vereineTabelle: 'Team-Station',
  nationalmannschaftTabelle: 'Team-Station',
  lgupdate: 'Saisonende 2016/17',
  nmupdate: [ '5. Februar', '1985' ] }

Would something like this be possible? (EN/FR, DE, ES, IT for now) const image = info.image || info.bildname || info.imagen || info.Immagine;

klemensz commented 6 years ago

Oh, and in the case of the fallback it returns the whole imageInfo object, not just the URL.

{ ns: 6,
  title: 'Datei:Cristiano Ronaldo - Dagur Brynjólfsson.jpg',
  missing: '',
  known: '',
  imagerepository: 'shared',
  imageinfo: 
   [ { url: 'https://upload.wikimedia.org/wikipedia/commons/c/ce/Cristiano_Ronaldo_-_Dagur_Brynj%C3%B3lfsson.jpg',
       descriptionurl: 'https://commons.wikimedia.org/wiki/File:Cristiano_Ronaldo_-_Dagur_Brynj%C3%B3lfsson.jpg',
       descriptionshorturl: 'https://commons.wikimedia.org/w/index.php?curid=11793793' } ] }

https://github.com/dijs/wiki/blob/master/src/page.js#L106

dijs commented 6 years ago

Okay, I added the different translations of "image", but I don't like returning different types. mainImage() always returns a image URL now.

dijs commented 6 years ago

Check v3.1.4 ~ The π version 👍