Breakthrough-Energy / PowerSimData

Simulation framework
https://breakthrough-energy.github.io/docs/
MIT License
52 stars 40 forks source link

Create a Zenodo download manager #697

Closed rouille closed 2 years ago

rouille commented 2 years ago

Pull Request doc

Purpose

Build the Zenodo class around the Zenodo Rest API to efficiently handle data coming from different record. Partially addresses #687.

What the code is doing

Testing

Manual testing.

Where to look

Usage Example/Visuals

>>> from powersimdata.network.zenodo import Zenodo
>>> z = Zenodo("3601881")
Title: PyPSA-Eur: An Open Optimisation Model of the European Transmission System (Dataset)
Publication date: 2022-09-20
Version: v0.6.1
DOI: 10.5281/zenodo.7251657
>>> z.load_data("powersimdata/network/europe_tub")
100% [....................................................................] 1784815481 / 1784815481
networks.zip (1702.1 MB)
>>> z.load_data("powersimdata/network/europe_tub")
networks.zip has been downloaded previously
>>> z = Zenodo("7251657")
Title: PyPSA-Eur: An Open Optimisation Model of the European Transmission System (Dataset)
Publication date: 2022-09-20
Version: v0.6.1
DOI: 10.5281/zenodo.7251657
>>> z.load_data("powersimdata/network/europe_tub")
networks.zip has been downloaded previously

We can now do that:

>>> tub = TUB("Europe", zenodo_record_id="latest", reduction=128)
Title: PyPSA-Eur: An Open Optimisation Model of the European Transmission System (Dataset)
Publication date: 2022-09-20
Version: v0.6.1
DOI: 10.5281/zenodo.7251657
networks.zip has been downloaded previously
>>> tub.build()
INFO:pypsa.io:Imported network elec_s_128_ec.nc has buses, carriers, generators, lines, links, loads, storage_units, stores
>>> tub = TUB("Europe", reduction=128)
Title: PyPSA-Eur: An Open Optimisation Model of the European Transmission System (Dataset)
Publication date: 2022-09-20
Version: v0.6.1
DOI: 10.5281/zenodo.7251657
networks.zip has been downloaded previously
>>> tub.build()
INFO:pypsa.io:Imported network elec_s_128_ec.nc has buses, carriers, generators, lines, links, loads, storage_units, stores

Time estimate

30min

jenhagg commented 2 years ago

We might want to update Pipfile.lock. Also, looks like wget is only installed as part of zenodo_get, so we should add that explicitly (otherwise it will be removed when the lock file is regenerated). I also noticed the wget package hasn't been updated since 2015, but.. if it works, that's probably fine.

rouille commented 2 years ago

We might want to update Pipfile.lock. Also, looks like wget is only installed as part of zenodo_get, so we should add that explicitly (otherwise it will be removed when the lock file is regenerated). I also noticed the wget package hasn't been updated since 2015, but.. if it works, that's probably fine.

Would you recommend using urllib instead?

BainanXia commented 2 years ago

We might want to update Pipfile.lock. Also, looks like wget is only installed as part of zenodo_get, so we should add that explicitly (otherwise it will be removed when the lock file is regenerated). I also noticed the wget package hasn't been updated since 2015, but.. if it works, that's probably fine.

Would you recommend using urllib instead?

I'm about to say the same thing since it's in STL.

jenhagg commented 2 years ago

We might want to update Pipfile.lock. Also, looks like wget is only installed as part of zenodo_get, so we should add that explicitly (otherwise it will be removed when the lock file is regenerated). I also noticed the wget package hasn't been updated since 2015, but.. if it works, that's probably fine.

Would you recommend using urllib instead?

We can do this with requests too:

def _wget(url, filename, size=None):
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        if size is None:
            size = r.headers.get("Content-Length")
        with open(filename, "wb") as f:
            with tqdm(
                unit="B",
                unit_scale=True,
                unit_divisor=1024,
                miniters=1,
                total=size,
            ) as pbar:
                for chunk in r.iter_content(chunk_size=8192):
                    f.write(chunk)
                    pbar.update(len(chunk))
rouille commented 2 years ago

We might want to update Pipfile.lock. Also, looks like wget is only installed as part of zenodo_get, so we should add that explicitly (otherwise it will be removed when the lock file is regenerated). I also noticed the wget package hasn't been updated since 2015, but.. if it works, that's probably fine.

Would you recommend using urllib instead?

We can do this with requests too:

def _wget(url, filename, size=None):
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        if size is None:
            size = r.headers.get("Content-Length")
        with open(filename, "wb") as f:
            with tqdm(
                unit="B",
                unit_scale=True,
                unit_divisor=1024,
                miniters=1,
                total=size,
            ) as pbar:
                for chunk in r.iter_content(chunk_size=8192):
                    f.write(chunk)
                    pbar.update(len(chunk))

Done

rouille commented 2 years ago

We might want to update Pipfile.lock. Also, looks like wget is only installed as part of zenodo_get, so we should add that explicitly (otherwise it will be removed when the lock file is regenerated). I also noticed the wget package hasn't been updated since 2015, but.. if it works, that's probably fine.

The update of the Pipfile.lock will be taken care of in a separate PR as some tests fail due to an updated version of pandas