dijs / infobox-parser

Parse Wikipedia Infoboxes
40 stars 18 forks source link

No output generated #16

Closed GitterHubber closed 5 years ago

GitterHubber commented 5 years ago

Hello!

Node version : v10.14.2 OS: Windows 7, 64bit

I did a

npm install infobox-parser

then ran this sample code

`var parseInfo = require("infobox-parser")

console.log(parseInfo({{Infobox Batman}}));`

I get an output { general: {} }

Basically I am trying to extract infobox values as name value pairs, I also tried it for another page like "Copper' even there I get the same output as above.

Is there due a change in the Wikipedia API? Is there a workaround to get it started?

dijs commented 5 years ago

That is the normal behavior. You must supply the full infobox source in order to parse it.

Like so:

const parseInfo = require("infobox-parser")

const info = parseInfo(`
{{cite book
|first1=Lisa K.
|last1= Schneider
|first2=Anja
|last2= Wüst
|first3=Anja 
|last3= Pomowski
|first4=Lin  
|last4= Zhang
|first5=Oliver   
|last5= Einsle
|editor=Peter M.H. Kroneck
|editor2=Martha E. Sosa Torres
|title=The Metal-Driven Biogeochemistry of Gaseous Compounds in the Environment
|series=Metal Ions in Life Sciences
|volume=14
|date=2014
|publisher=Springer
|chapter=Chapter 8. ''No Laughing Matter: The Unmaking of the Greenhouse Gas Dinitrogen Monoxide by Nitrous Oxide Reductase''
|pages=177–210
|doi=10.1007/978-94-017-9269-1_8
}}
`);

console.log(info);

// Outputs
{ general:
   { first1: 'Lisa K.',
     last1: 'Schneider',
     first2: 'Anja',
     last2: 'Wüst',
     first3: 'Anja',
     last3: 'Pomowski',
     first4: 'Lin',
     last4: 'Zhang',
     first5: 'Oliver',
     last5: 'Einsle',
     editor: 'Peter M.H. Kroneck',
     editor2: 'Martha E. Sosa Torres',
     title:
      'The Metal-Driven Biogeochemistry of Gaseous Compounds in the Environment',
     series: 'Metal Ions in Life Sciences',
     volume: '14',
     date: 2014-01-01T00:00:00.000Z,
     publisher: 'Springer',
     chapter:
      'Chapter 8. No Laughing Matter: The Unmaking of the Greenhouse Gas Dinitrogen Monoxide by Nitrous Oxide Reductase',
     pages: '177–210',
     doi: '10.1007/978-94-017-9269-1_8' } }

This library does not fetch from wikipedia, this is only the parser.

Maybe this is what you are looking for: https://www.npmjs.com/package/wikijs

You can fetch pages and get their parsed information easily:

const wiki = require("wikijs").default

wiki().page('Batman').then(page => page.info()).then(console.log)