dijs / infobox-parser

Parse Wikipedia Infoboxes
40 stars 18 forks source link

Point number format problem #4

Closed goferito closed 7 years ago

goferito commented 7 years ago

If a number has comma separators, like the case for populationTotal in the Dublin test, it doesn't parse well.

This PR adds a test revealing the problem.

The problem is even more complex, since the parser would need to consider the language of the page, to determine if the comma stands as thousand separator, or decimal separator. For example, the JS number 5000.05, would be formatted '5,000.05' in English, but '5.000,05' in Spanish. I point this, because my first solution was to just remove the commas before the parseInt(), but it doesn't work when the page is in Spanish.

Maybe a solution could be to just always return strings, and delegate the responsibility of casting to the user of the module. (which would still be me :laughing: , but at least I know when I am requesting the page in english or spanish, so I can decide to cast the number considering english or spanish format)

goferito commented 7 years ago

Maybe there is a better solution, but I think it's already a quite decent improvement

goferito commented 7 years ago

Something wrong with this?

dijs commented 7 years ago

Not, at all, sorry for the delay. This is a great addition!