Open EmilLuta opened 7 years ago
Hey @EmilLuta,
Our current method of data collection is community based. I find this method very accurate, it's the same method used by Wikipedia for instance.
However, if you think you can create a crawler that will be accurate enough, I'd love to add it to this project.
@bluzi I'll give it a go. The scope of this would be to enhance the current name entries, not go further into adding new entries. I can see your point of view. Looking forward to see how we'll be able to validate 'accurate enough', if the crawler is done. I'll keep you pinned.
Good to see we're on the same page here. Can't wait to see the outcome of this.
@bluzi Some job stuff has been done. Wikipedia doesn't seem to be such a reliable source (just a couple of names have translations) and even though this works, it's far from complete. My suggestion would be to create a new branch on your repo and integrate this for now (just to keep up with the reference) so far. From this point forth, I'm going to address https://www.behindthename.com/ and come with PoC ASAP.
Let me know what you think.
@EmilLuta I added you as a collaborator, so feel free to create a branch and push your code.
@bluzi Hi there. I strongly believe that from time to time, a crawler could be ran in order to fetch translations from some official providers (for now, let's stick to Wikipedia).
Therefore, the goal would be to write such a scraper that goes through all entries and tries to fetch meaning/ translations/ aliases from given sources.
Let me know what you think. I could give you a hand of help with Python, if that's alright.