dijs / infobox-parser

Parse Wikipedia Infoboxes
40 stars 18 forks source link

Cyrillic text support #12

Closed annapogorelova closed 6 years ago

annapogorelova commented 6 years ago

Hi @dijs and thank you very much for the great work!

I have tried parsing the wikitext of the article in Cyrillic language (Ukrainian) and looks like the lib doesn't support the Cyrillic text.

var infoboxParser = require("infobox-parser");
var result = infoboxParser(`{{Вулиця України
|назва = Вулиця Підвальна
|населений пункт = Львів
}}`)

Currently this code results in an empty object.

Do you plan to add a support for the Cyrillic text?

dijs commented 6 years ago

Ooo, I would love to. But I may need some help with that...

Could you find a simple article we could test parsing with?

And maybe provide a english version as well?

annapogorelova commented 6 years ago

Sure. See the links below (a short Wikipedia article about Orion Nebula in English, Ukrainian and Russian):

English Ukrainian Russian

dijs commented 6 years ago

Interesting... So, I just added tests around this article. And it seems to work just fine. You can see here:

https://github.com/dijs/infobox-parser/pull/13

annapogorelova commented 6 years ago

Yes, you are right, it works. Maybe I missed something previously, sorry. Anyway, thank you!