Some supporting files are required for the usage of scispacy, such as tfidf_vectors_sparse.npz, nmslib_index.bin, etc.
And some of them are of large size to 500MB. So that it's hard to determine the its ETA for those in unstable internet connection.
In my case, the bandwidth was fine at first, but later dropped to about 30KB/s.
So I implement the downloading progress bar for monitorin as seen in the image.
If you agree with the need of progress bar, I would be glad to raise a PR.
Also if you have some other features in mind, I would be glad to improve the current implementation.
My preliminary implementation
from tqdm import tqdm
def http_get(url: str, temp_file: IO) -> None:
req = requests.get(url, stream=True)
total = int(req.headers.get('content-length', 0))
pbar = tqdm(total=total, unit='iB', unit_scale=True, unit_divisor=1024)
for chunk in req.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
size = temp_file.write(chunk)
pbar.update(size)
pbar.close()
Some supporting files are required for the usage of scispacy, such as
tfidf_vectors_sparse.npz
,nmslib_index.bin
, etc. And some of them are of large size to 500MB. So that it's hard to determine the its ETA for those in unstable internet connection.In my case, the bandwidth was fine at first, but later dropped to about 30KB/s.
So I implement the downloading progress bar for monitorin as seen in the image.![image](https://github.com/allenai/scispacy/assets/43513739/bb50e4a0-966f-40f5-a837-3326639eeed6)
If you agree with the need of progress bar, I would be glad to raise a PR. Also if you have some other features in mind, I would be glad to improve the current implementation.
My preliminary implementation