gilienv / EssOilDB

Restructuring of Essential Oil Database
Apache License 2.0
8 stars 6 forks source link

Import1.0 locations #93

Open petermr opened 4 years ago

petermr commented 4 years ago

There are currently the following files with Locations: https://github.com/gilienv/EssOilDB/blob/master/tables/location/countryNew.csv this contains resolved coordinates for most locations but no history Where did the data come from? when?

Other problems:

The components are not all countries so have renamed the file location.csv

ambarishK commented 4 years ago

Sir,

Google map link and coordinates (latitude and longitude) for each Address is assigned manually by me. It was done in the month of JUNE. I fetched all informations online from GOOGLE MAP. Also, I separated address to get state, city and country values. I assigned many country values to available addresses.

All addresses are from the database sheet - infoplant.

ambarishK commented 4 years ago

Updated location table with unique ID.

Column details are as follows.

If country is not mentioned into the Address values, it is added by my own (using web search). City/town, State values are obtained from the Address value.

Place names are capitalized.

Encoding is not as of UTF8.

petermr commented 4 years ago

Thank you. So you manually entered each of the names in the locations and read off the google coordinates. What ambiguities did you get and how did you resolve them? Are there still ambiguities? Checking there are 607 entries?

On Wed, Aug 7, 2019 at 9:46 AM Ambarish Kumar notifications@github.com wrote:

Updated location https://github.com/gilienv/EssOilDB/blob/master/tables/location/country070819.csv table with unique ID.

Column details are as follows.

  • ID - Unique location ID e.g ELoc0001234.
  • Address - Full address of experimental location (ExpLoc + City/town
  • State + Country) .
  • GoogleMapLink - Link of Address value to GOOGLE MAP.
  • Latitude - Latitude coordinate of the Address value.
  • Longitude - Longitude coordinate of the Address value.
  • ExpLoc - Location of experiment.
  • City/town - City or town of Address value.
  • State - State of the Address value.
  • Country - Country of the Address value.

If country is not mentioned into the Address values, it is added by my own. City/town, State values are obtained from the Address value.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gilienv/EssOilDB/issues/93?email_source=notifications&email_token=AAFTCS444GMT3L6IOT3QN6TQDKDYHA5CNFSM4IJ2XS72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3XV74Q#issuecomment-519004146, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCSY635OPTS75ESGDCX3QDKDYHANCNFSM4IJ2XS7Q .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

ambarishK commented 4 years ago

Yes sir. Each of the names in the locations and read off the google coordinates are manually entered. As such there was no ambiguities while getting those values. All entries were easily fetched as a simple step of web-search.

Some issues which I find in the present table are as follows.

    Dom Pedrode Alcantara, Brazil
    Dom Pedro de Alcantara, Southern Brazil

There are 878 entries.

petermr commented 4 years ago

Ambarish, These are LOCATIONS, NOT countries. Please drop the "country" word.

Some issues which I find in the present table are as follows.

  • Central Alps - it did not generate search result. It is not part of a single country.

These are REGIONS. We will need a policy on this - do we save polygons, but this is NOT high priority.

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

ambarishK commented 4 years ago

OK sir.

ambarishK commented 4 years ago

Normalized location table.

Total records - 817.

Column details are as follows.

If country is not mentioned into the Address values, it is added by my own (using web search). City/town, State values are obtained from the Address value.

Place names are capitalized.

Encoding is as per UTF8.

petermr commented 4 years ago

Thank you. Two MAJOR problems.

A) These are LOCATIONs NOT Countries. You file is badly named. I have made this clear before. B) You have used an ID system of the form EBibddddd. This is reserved for BIBLIOGRAPHY. The names clashes will destroy the database. Use ELocddddd.

Peter

On Tue, Aug 13, 2019 at 1:05 PM Ambarish Kumar notifications@github.com wrote:

Normalized location https://github.com/gilienv/EssOilDB/blob/master/tables/location/countryNewNormalized130819.csv table.

Total records - 817.

Column details are as follows.

  • ID - Unique location ID e.g ELoc0001234. <<<<< ***
  • Address - Full address of experimental location (ExpLoc + City/town
  • State + Country) .
  • GoogleMapLink - Link of Address value to GOOGLE MAP.
  • Latitude - Latitude coordinate of the Address value.
  • Longitude - Longitude coordinate of the Address value.
  • ExpLoc - Location of experiment.
  • City/town - City or town of Address value.
  • State - State of the Address value.
  • Region - Region to which Address value belongs to (most of them are country of the Address values).

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

ambarishK commented 4 years ago

Sir, resolved all problems related file naming and ID assignment.