allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.
https://allenai.github.io/scispacy/
Apache License 2.0
1.66k stars 223 forks source link

Add progress bar for installation downloading #490

Closed WeixiongLin closed 9 months ago

WeixiongLin commented 11 months ago

Some supporting files are required for the usage of scispacy, such as tfidf_vectors_sparse.npz, nmslib_index.bin, etc. And some of them are of large size to 500MB. So that it's hard to determine the its ETA for those in unstable internet connection.

In my case, the bandwidth was fine at first, but later dropped to about 30KB/s.

So I implement the downloading progress bar for monitorin as seen in the image. image

If you agree with the need of progress bar, I would be glad to raise a PR. Also if you have some other features in mind, I would be glad to improve the current implementation.

My preliminary implementation

from tqdm import tqdm
def http_get(url: str, temp_file: IO) -> None:
    req = requests.get(url, stream=True)
    total = int(req.headers.get('content-length', 0))
    pbar = tqdm(total=total, unit='iB', unit_scale=True, unit_divisor=1024)
    for chunk in req.iter_content(chunk_size=1024):
        if chunk:  # filter out keep-alive new chunks
            size = temp_file.write(chunk)
            pbar.update(size)
    pbar.close()
dakinggg commented 10 months ago

Hey @WeixiongLin that sounds good to me!

WeixiongLin commented 10 months ago

Thanks for your review! I will raise a PR.

WeixiongLin commented 9 months ago

I have raise a PR, hope you find it helpful.