GoldenCheetah / OpenData

A project to collect, collate and share an open data set with contributions from users of the GoldenCheetah application
38 stars 6 forks source link

Init of Python library #3

Closed AartGoossens closed 5 years ago

AartGoossens commented 6 years ago

Based on earlier work for https://github.com/AartGoossens/sweatpy.

glemaitre commented 6 years ago

What could be cool is to not remove the file. Like this before the download the data, it is possible to check if it is present locally and just unzip. We can also make some check regarding the hash of the downloaded and remote data.

AartGoossens commented 6 years ago

@glemaitre

What could be cool is to not remove the file. Like this before the download the data, it is possible to check if it is present locally and just unzip. We can also make some check regarding the hash of the downloaded and remote data.

I'm not sure if I understand what you mean. Do you suggest to just skip removal of the zip file? What's the use case for this? People will most likely download the zip using this library right?

glemaitre commented 6 years ago

What I mean is to avoid redownloading the data that we have locally. I don't think that right now there is a mechanism to prevent it, isn't it.

AartGoossens commented 6 years ago

What I mean is to avoid redownloading the data that we have locally. I don't think that right now there is a mechanism to prevent it, isn't it.

Ah yes, that's correct. I expect it to fail now at unzipping the data because the files already exist. I didn't get to include error handling like that. Checking if an athlete is already downloaded should be done here indeed.

glemaitre commented 6 years ago

Actually I recently wrote the following which check the hash before redownloading the data. I think that it could useful. The only thing that we need to check is that OSF is having an hash of the zip file.

liversedge commented 6 years ago

The files have a version number. They will change infrequently -- currently new versions are only uploaded > 365 days since the last.

glemaitre commented 6 years ago

The files have a version number.

Yep and online you can have the hashes. I would assume that we could get it from the metadata. I have to check the CLI.

AartGoossens commented 6 years ago

Yep and online you can have the hashes. I would assume that we could get it from the metadata. I have to check the CLI.

The hashes are not available directly from the cli but once you have the file id you can call the api to get them:

requests.get(f'https://api.osf.io/v2/files/{file.id}')

... will give you a response which contains the sha256 and md5 hashes:

"attributes": {
    "extra": {
        "hashes": {
            "sha256": "3a3f980006b3492e1f7e5aa38470af8f3e01657d5806d02d54ee1671eeea1d85",
            "md5": "5c06fe6564ed26d02b59cbd2d0796950"
        },
        "downloads": 1
    }
}
glemaitre commented 6 years ago

This is something which could be contributed upstream I think. I might try to make PR upstream and see what the dev think. 

glemaitre commented 6 years ago

FYI: https://github.com/osfclient/osfclient/pull/142

AartGoossens commented 5 years ago

Open Data is now also on AWS S3 and since creating a library for this resource is more convenient I started from scratch again and created a new PR here https://github.com/GoldenCheetah/OpenData/pull/5. Closing this PR.