PetrochukM / PyTorch-NLP

Basic Utilities for PyTorch Natural Language Processing (NLP)
https://pytorchnlp.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2.21k stars 258 forks source link

Can't get pre-trained FastText embedding #39

Closed jfilter closed 6 years ago

jfilter commented 6 years ago

Expected Behavior

I want to get a pre-trained FastText embedding in reasonable amount of time (and save it to a caching folder)

Actual Behavior

It takes a lot of time (ETA: 1000 hours) and eventually results into:

ConnectionResetError: [Errno 104] Connection reset by peer

I tested it on macOS and Ubuntu 16.04

Steps to Reproduce the Problem

Run

from torchnlp.word_to_vector import FastText

vectors = FastText(cache='cache')
PetrochukM commented 6 years ago

Just tested the code, it works fine.

image

The default FastText embeddings are 6.6 gigabytes; therefore, it needs to be downloaded. For me, that takes 40 minutes.

Learn more about "Connections reset by peer": https://www.quora.com/What-does-the-104-Connection-reset-by-peer-error-notification-mean-and-how-do-I-fix-it

This means that your internet dropped connection during the download process, unfortunatly, there is not much I can do about that. I think. Let me know if otherwise.

jfilter commented 6 years ago

Oh sorry, it looks like my ISP throttles the speed to AWS (wtf?). I had to tunnel it via my server.