HaveF / etymonline_scraper

A simple crawler which pulls down word/origin pairs from etymonline.com
1 stars 1 forks source link

Online Etymology Dictionary Scraper

It works! I have plans for building an HMM capable of recognizing language of origin from a word's orthographical form.

After you have scrapy installed you can run it with the following command while in the project directory. This will create a large JSON file of word and origins pairs.

scrapy crawl etymonline.com -o etymonline_data.json -t json -s LOG_FILE=etymonline_data.log -L WARNING

-L used to set Log levels.