Open cwdabba opened 1 year ago
I would suggest keeping the id column as is.
For those looking to import this to their MongoDB, I wrote a script that can do that (but using prisma):
import { Country, PrismaClient, State } from '@prisma/client';
import axios from 'axios';
/**
* We want to group items so that we can limit the number of
* asynchronous requests we send to the database
*/
function group<T = any>(items: T[], max: number) {
const grouped: T[][] = [];
do {
if (items.length > max) grouped.push(items.splice(0, max));
else grouped.push(items.splice(0));
} while (items.length > 0);
return grouped;
}
async function populate() {
const client = new PrismaClient();
await client.$connect();
const maxItems = 20;
async function populateCountries(_countries: any[]) {
const countriesGroups = group(_countries, maxItems);
for (const countries of countriesGroups) {
await Promise.all(
countries.map(async _country => {
const numericCode = String(_country.numeric_code);
const country = await client.country.upsert({
where: { numericCode },
create: {
name: _country.name,
iso2: String(_country.iso2),
iso3: String(_country.iso3),
numericCode,
phoneCode: String(_country.phone_code)
},
update: {
name: _country.name,
phoneCode: String(_country.phone_code)
}
});
await populateStates(_country.states, country);
})
);
}
console.log('Upserted countries');
}
async function populateStates(_states: any[], country: Country) {
const statesGroups = group(_states, maxItems);
for (const states of statesGroups) {
await Promise.all(
states.map(async (_state: any) => {
const ref = String(_state.id);
const state = await client.state.upsert({
where: { ref },
create: {
name: _state.name,
ref,
country: {
connect: {
id: country.id
}
}
},
update: {
name: _state.name
}
});
await populateCities(_state.cities, state);
})
);
}
console.log('Upserted states for', country.name);
}
async function populateCities(cities: any[], state: State) {
await client.$transaction(
cities.map(city => {
const ref = String(city.id);
return client.city.upsert({
where: { ref },
create: {
name: city.name,
ref,
state: {
connect: {
id: state.id
}
}
},
update: {
name: city.name
}
});
})
);
console.log('Upserted cities for', state.name);
}
try {
const { data } = await axios.get<any>(
'https://raw.githubusercontent.com/dr5hn/countries-states-cities-database/master/countries%2Bstates%2Bcities.json'
);
const start = Date.now();
await populateCountries(data);
console.log('took', (Date.now() - start) / 1000);
} catch (error) {
console.log(error);
await client.$disconnect();
}
}
populate();
This will also handle updates. I added a throttle to ensure that you don't overload your database because of all the async requests the script is sending, I ran it on my macos local to my mongodb docker container and I find that setting maxItems=20
is good, it will take ~300 seconds to import everything. There are 150,573 Cities, 5,084 states, and 250 countries.
@dosubot write a script which will help convert the current db to mongodb
Hey @dr5hn! Great to see you diving into the depths of data again 🚀😄
In MongoDB, the _id
field is a special field that acts as the primary key for the document. By default, MongoDB creates this field and assigns a unique ObjectId for each document. However, you can also provide your own _id
when you insert the document.
If you have an id
field in your data and you want to use that as the _id
in MongoDB, you can rename the id
field to _id
before importing. This way, MongoDB will use your id
as the _id
and won't create a new one.
Here's a Python script that reads the JSON file, renames the id
field to _id
, and writes the modified data back to a new JSON file:
import json
with open('countries.json', 'r') as f:
data = json.load(f)
for doc in data:
doc['_id'] = doc.pop('id')
with open('countries_mongodb.json', 'w') as f:
json.dump(data, f)
You can then import the new JSON file into MongoDB using the mongoimport
command:
mongoimport --db yourDbName --collection yourCollectionName --file countries_mongodb.json
Please replace 'countries.json' and 'countries_mongodb.json' with the actual paths to your files, and replace 'yourDbName' and 'yourCollectionName' with the actual names of your MongoDB database and collection.
Remember to do this for all the tables (countries, states, cities) you are importing.
This approach will not create any issues as long as the id
values in your data are unique. If there are duplicate id
values, MongoDB will throw an error when you try to import the data, because _id
values must be unique within the collection.
I hope this helps! If you have any other questions, feel free to ask.
import json with open('countries.json', 'r') as f: data = json.load(f) for doc in data: doc['_id'] = doc.pop('id') with open('countries_mongodb.json', 'w') as f: json.dump(data, f)
mongoimport --db yourDbName --collection yourCollectionName --file countries_mongodb.json
@dr5hn That script saved my life. Btw don't forget --jsonArray
at the end Ex:
mongoimport --db yourDbName --collection yourCollectionName --file countries_mongodb.json --jsonArray
Hi thank you very much for the library, its really useful. I tried importing the countries, states & cities tables in MongoDB but the issues is now I am having _id & id column.
Will it have any issues? Or should I rename id column to _id column before importing?
Any suggestions will really help me a lot.