dr5hn / countries-states-cities-database

🌍 Discover our global repository of countries, states, and cities! 🏙️ Get comprehensive data in JSON, SQL, PSQL, XML, YAML, and CSV formats. Access ISO2, ISO3 codes, country code, capital, native language, timezones (for countries), and more. #countries #states #cities
https://dr5hn.github.io/countries-states-cities-database/
Open Data Commons Open Database License v1.0
7.29k stars 2.54k forks source link

MongoDB import #542

Open cwdabba opened 1 year ago

cwdabba commented 1 year ago

Hi thank you very much for the library, its really useful. I tried importing the countries, states & cities tables in MongoDB but the issues is now I am having _id & id column.

Will it have any issues? Or should I rename id column to _id column before importing?

Any suggestions will really help me a lot.

dr5hn commented 1 year ago

I would suggest keeping the id column as is.

aprilmintacpineda commented 10 months ago

For those looking to import this to their MongoDB, I wrote a script that can do that (but using prisma):

import { Country, PrismaClient, State } from '@prisma/client';
import axios from 'axios';

/**
 * We want to group items so that we can limit the number of
 * asynchronous requests we send to the database
 */
function group<T = any>(items: T[], max: number) {
  const grouped: T[][] = [];

  do {
    if (items.length > max) grouped.push(items.splice(0, max));
    else grouped.push(items.splice(0));
  } while (items.length > 0);

  return grouped;
}

async function populate() {
  const client = new PrismaClient();
  await client.$connect();
  const maxItems = 20;

  async function populateCountries(_countries: any[]) {
    const countriesGroups = group(_countries, maxItems);

    for (const countries of countriesGroups) {
      await Promise.all(
        countries.map(async _country => {
          const numericCode = String(_country.numeric_code);

          const country = await client.country.upsert({
            where: { numericCode },
            create: {
              name: _country.name,
              iso2: String(_country.iso2),
              iso3: String(_country.iso3),
              numericCode,
              phoneCode: String(_country.phone_code)
            },
            update: {
              name: _country.name,
              phoneCode: String(_country.phone_code)
            }
          });

          await populateStates(_country.states, country);
        })
      );
    }

    console.log('Upserted countries');
  }

  async function populateStates(_states: any[], country: Country) {
    const statesGroups = group(_states, maxItems);

    for (const states of statesGroups) {
      await Promise.all(
        states.map(async (_state: any) => {
          const ref = String(_state.id);

          const state = await client.state.upsert({
            where: { ref },
            create: {
              name: _state.name,
              ref,
              country: {
                connect: {
                  id: country.id
                }
              }
            },
            update: {
              name: _state.name
            }
          });

          await populateCities(_state.cities, state);
        })
      );
    }

    console.log('Upserted states for', country.name);
  }

  async function populateCities(cities: any[], state: State) {
    await client.$transaction(
      cities.map(city => {
        const ref = String(city.id);

        return client.city.upsert({
          where: { ref },
          create: {
            name: city.name,
            ref,
            state: {
              connect: {
                id: state.id
              }
            }
          },
          update: {
            name: city.name
          }
        });
      })
    );

    console.log('Upserted cities for', state.name);
  }

  try {
    const { data } = await axios.get<any>(
      'https://raw.githubusercontent.com/dr5hn/countries-states-cities-database/master/countries%2Bstates%2Bcities.json'
    );

    const start = Date.now();
    await populateCountries(data);
    console.log('took', (Date.now() - start) / 1000);
  } catch (error) {
    console.log(error);
    await client.$disconnect();
  }
}

populate();

This will also handle updates. I added a throttle to ensure that you don't overload your database because of all the async requests the script is sending, I ran it on my macos local to my mongodb docker container and I find that setting maxItems=20 is good, it will take ~300 seconds to import everything. There are 150,573 Cities, 5,084 states, and 250 countries.

dr5hn commented 7 months ago

@dosubot write a script which will help convert the current db to mongodb

dosubot[bot] commented 7 months ago

Hey @dr5hn! Great to see you diving into the depths of data again 🚀😄

In MongoDB, the _id field is a special field that acts as the primary key for the document. By default, MongoDB creates this field and assigns a unique ObjectId for each document. However, you can also provide your own _id when you insert the document.

If you have an id field in your data and you want to use that as the _id in MongoDB, you can rename the id field to _id before importing. This way, MongoDB will use your id as the _id and won't create a new one.

Here's a Python script that reads the JSON file, renames the id field to _id, and writes the modified data back to a new JSON file:

import json

with open('countries.json', 'r') as f:
    data = json.load(f)
    for doc in data:
        doc['_id'] = doc.pop('id')

with open('countries_mongodb.json', 'w') as f:
    json.dump(data, f)

You can then import the new JSON file into MongoDB using the mongoimport command:

mongoimport --db yourDbName --collection yourCollectionName --file countries_mongodb.json

Please replace 'countries.json' and 'countries_mongodb.json' with the actual paths to your files, and replace 'yourDbName' and 'yourCollectionName' with the actual names of your MongoDB database and collection.

Remember to do this for all the tables (countries, states, cities) you are importing.

This approach will not create any issues as long as the id values in your data are unique. If there are duplicate id values, MongoDB will throw an error when you try to import the data, because _id values must be unique within the collection.

I hope this helps! If you have any other questions, feel free to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
FractalMind commented 3 months ago
import json

with open('countries.json', 'r') as f:
    data = json.load(f)
    for doc in data:
        doc['_id'] = doc.pop('id')

with open('countries_mongodb.json', 'w') as f:
    json.dump(data, f)
mongoimport --db yourDbName --collection yourCollectionName --file countries_mongodb.json

@dr5hn That script saved my life. Btw don't forget --jsonArray at the end Ex: mongoimport --db yourDbName --collection yourCollectionName --file countries_mongodb.json --jsonArray