Closed rufuspollock closed 1 year ago
This could be a source: http://www.geonames.org/countries/
This could be another source of it: http://unstats.un.org/unsd/methods/m49/m49regin.htm
Can I take this one?
@zelima that would be great. Can you detail what you propose to do here and @pdehaye and I can review.
@rgrp @pdehaye since none of the link provide CSV or other type of data, think I'll scratch data directly from HTML. From this source - http://www.geonames.org/countries/ Heading and first few lines of CSV file:
country, continent
Andora, Europe
United Arab Emirates, Asia
@zelima geonames almost always provides their data in bulk so I would be surprised if we need to scrape from HTML - e.g. is this the countryInfo.txt listed on http://download.geonames.org/export/dump/readme.txt
There are a couple of things to resolve here:
@pdehaye @lexman i'd welcome comments on point 1
The country-codes dataset already has a ot of usefull infos like currency or phone code, so I think users could expect to see the continent here.
How would you do ?
I tend for the first one, as it would be coherent with other codes.
@rgrp Do you mean something like this http://download.geonames.org/export/dump/countryInfo.txt ? Do you want me to work with txt file? I can not find any other link for download, That is in relation with countries and continents. If I'm heading in wrong direction, please help. Adding additional column for ISO2 digit is no problem.
@lexman is your question for me? Cause I don't really got it.
@lexman are there standard continent codes? The geonames dataset only has 2 digit codes e.g. "EU".
Overalll I agree with @lexman on approach.
UN M.49 is a standard for 3-digit area codes used by UN for statistical purposes. These codes refer a wide variety of geographical, political, or economic regions including continents and countries, and codes for countries correspond to ISO 3166-1 Alpha-3.
I'm almost done with scraping UN M.49 with Node.js. I don't know it could help, though.
Sorry @zelima, this was a rethorical question. My point was : it's better to add continent codes in a new column, but we need to create a new datapackage that explicits this codes.
By the way, I don't think NaturalEarth's codes are a standard and I don't know any.
@lexman OK that's no problem. @rgrp can you please answer my question, about download link?
@jgkim can you please open a new issue for UN M.49 - definitely good to get that in.
@zelima yes - let's use that txt file if we can. I've also opened an issue for a list of continents #174
@rgrp should we continue with this? If yes, there is an issue regarding text file. I'm not sure I can work with that. It is extremely unordered.
The only thing that gives me a hope is that every line may have first five 'column' fulfilled. in that case I could do something like
if line contains one of two letter symbol from continent-codes:
take first 'word' form line (ISO2) and fifths 'word' (Country name)
do other stuff
This will only work if every first and fifth word will be the needed value. Plus There should not be any other two letter entry matching to one of continent-codes, except one that should be. for example something like this: ANAGRAM there is 'NA' in word, which refers to one of continent-code, so it will mess the data up
I suspect we add this to country-codes - we probably just want to do this by hand. I suggest we open an issue on the country-codes repo.
@lexman any thoughts?
Hello @zelima,
I've faced the same issue with countryInfo.txt. I think in geoname's mind, everyline begining by a #
is a comment, then there is a tabulation-separated-values file with a header.
You can find my python code to parse the file here : https://github.com/lexman/world-cities/blob/master/scripts/tuttlefile#L67
@rgrp @zelima I like the idea of adding the continent code to the country-codes. We'll have a truly frictionless datapackage that would be easy to join with anything...
Or that :)
Yeh, # lines are easy to handle with... I was talking about the actual data. I didn't mean it's impossible, Just think it's hard to be sure that parsed data will be 100% correct, because of possible 'traps' in txt file.. So, final word up to you @rgrp? should I do or not?
@zelima i don't get the question right now. As I said we should open an issue on the country-codes repo recommending what we suggest and then open an appropriate request once we have a response from the maintainer.
We then close this issue in favour of that issue.
@zelima has @rgrp's suggestion been implemented?
@pdehaye I was little confused about what should I do, so I took a step back. @rgrp did you mean me under 'we', when you said
As I said we should open an issue on the country-codes repo recommending what we suggest
@zelima yes i meant "you" here ;-)
Should this issue be close? https://github.com/datasets/world-cities
I've opened an issue here:
FIXED. This was fixed in https://github.com/datasets/country-codes/issues/30
May want this to be part of an existing dataset e.g. http://data.okfn.org/data/core/country-list or http://data.okfn.org/data/core/country-codes.
Would prefer not to "pollute" the simplicity of the former with this info - its aim is to be ultra-simple so maybe the latter one. Or possibly, its own really simple dataset.