Closed kuriwaki closed 3 years ago
@kuriwaki: here the updated one, which should work.
import io
import pandas as pd
from pyDataverse.api import NativeApi
from pyDataverse.api import DataAccessApi
doi = "doi:10.7910/DVN/HIDLTK"
base_url = "https://dataverse.harvard.edu"
n_api = NativeApi(base_url)
resp = n_api.get_dataset(doi)
datafiles = resp.json()["data"]["latestVersion"]["files"]
# confirm file
print(datafiles[8]["dataFile"]["filename"])
# 'us_county_confirmed_cases.tab'
print(datafiles[8]["dataFile"]["id"])
# 4360740
# datafile
datafile_id = datafiles[8]["dataFile"]["id"]
da_api = DataAccessApi(base_url)
resp = da_api.get_datafile(datafile_id)
# try to read as csv
data = io.StringIO(str(resp.content, 'utf-8'))
us_states_cases = pd.read_csv(data, sep='\t') # any option to get the data so to read as csv?
print(us_states_cases.head(10)) # this gives a long line of metadata, not a clean dataframe
The problem was: To get the Datafile, it is necessary to use the DataAccessApi. And also fixed file_name
to filename
for the datafile-key.
Thanks! This example worked.
The datafest (i.e. pre-v0.3.0) code has us importing datafiles as follows. However, this code currently gives me metadata about the file, not the 3000 x 300 tabular dataset it is supposed to be (https://doi.org/10.7910/DVN/HIDLTK).
Can you show (1) how to import this as a pandas dataframe, and (2) whether it's possible set an option
format = original
inget_datafile
to download the original, not the ingested version of files? For example the file in question is orginally a CSV, but was a TSV when ingested into Dataverse.I wasn't sure how this worked after reading the
get_datafile
page in Docs/Reference, but if there's any other place in Docs I should look that would be helpful too.