bluzi / name-db

:rocket: A multilingual collection of names from around the world
MIT License
59 stars 230 forks source link

Translation crawler #95

Open EmilLuta opened 7 years ago

EmilLuta commented 7 years ago

@bluzi Hi there. I strongly believe that from time to time, a crawler could be ran in order to fetch translations from some official providers (for now, let's stick to Wikipedia).

Therefore, the goal would be to write such a scraper that goes through all entries and tries to fetch meaning/ translations/ aliases from given sources.

Let me know what you think. I could give you a hand of help with Python, if that's alright.

bluzi commented 7 years ago

Hey @EmilLuta,

Our current method of data collection is community based. I find this method very accurate, it's the same method used by Wikipedia for instance.

However, if you think you can create a crawler that will be accurate enough, I'd love to add it to this project.

EmilLuta commented 7 years ago

@bluzi I'll give it a go. The scope of this would be to enhance the current name entries, not go further into adding new entries. I can see your point of view. Looking forward to see how we'll be able to validate 'accurate enough', if the crawler is done. I'll keep you pinned.

bluzi commented 7 years ago

Good to see we're on the same page here. Can't wait to see the outcome of this.

EmilLuta commented 7 years ago

@bluzi Some job stuff has been done. Wikipedia doesn't seem to be such a reliable source (just a couple of names have translations) and even though this works, it's far from complete. My suggestion would be to create a new branch on your repo and integrate this for now (just to keep up with the reference) so far. From this point forth, I'm going to address https://www.behindthename.com/ and come with PoC ASAP.

Let me know what you think.

bluzi commented 7 years ago

@EmilLuta I added you as a collaborator, so feel free to create a branch and push your code.