Open Chowti opened 9 months ago
Unfortunately I think I'm in the same boat, I'm new to using pandasdmx and actually sdmx structures as well, but here is my code:
from pandasdmx import Request
import logging
import pandasdmx
abs_xml = pandasdmx.Request('ABS_XML',
log_level=logging.INFO)
# Dataflows
flow_msg = abs_xml.dataflow(force=True) # get dataflows
dataflows_pandas = pandasdmx.to_pandas(flow_msg.dataflow) # convert to pandas DataFrame
dataflows_pandas.to_csv('dataflows.csv') # save dataflows to csv
sa2Data = dataflows_pandas[dataflows_pandas.str.contains('SA2+', case=False)] # filter dataflows for SA2
sa2Data.to_csv('sa2DataFlows.csv') # save SA2 dataflows to csv
example_msg = abs_xml.dataflow(resource=flow_msg.dataflow.C21_G04_SA2) # get dataflow for C21_G04_SA2
When I run that, I get the following RuntimeError
../venv/lib/python3.10/site-packages/pandasdmx/remote.py:11: RuntimeWarning: optional dependency requests_cache is not installed; cache options to Session() have no effect
warn(
2024-02-22 14:58:04,233 pandasdmx.api - INFO: Requesting resource from https://api.data.abs.gov.au/dataflow/ABS/latest
2024-02-22 14:58:04,233 pandasdmx.api - INFO: with headers {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
2024-02-22 14:58:07,415 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2024-02-22 14:58:08,110 pandasdmx.api - INFO: Requesting resource from https://api.data.abs.gov.au/dataflow/ABS/C21_G04_SA2/latest?references=all
2024-02-22 14:58:08,110 pandasdmx.api - INFO: with headers {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
2024-02-22 14:58:10,468 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
--- SS without DSD ---
{1: False}
--- <class 'pandasdmx.message.StructureMessage'> ---
{2: <pandasdmx.StructureMessage>
<Header>
id: 'IDREF23404'
prepared: '2024-02-22T13:40:55.674541+11:00'
receiver: <Agency Unknown>
sender: <Agency Unknown>
source:
test: False}
--- <class 'pandasdmx.model.DataStructureDefinition'> ---
{'C21_G04_SA2': <DataStructureDefinition ABS:C21_G04_SA2(1.0.0): Census 2021, G04 Age by sex, Main Statistical Areas Level 2 and up (SA2+) Datastructure>}
--- <class 'pandasdmx.model.Agency'> ---
{'ABS': <Agency ABS>}
--- <class 'pandasdmx.model.DataflowDefinition'> ---
{'C21_G04_SA2': <DataflowDefinition ABS:C21_G04_SA2(1.0.0): Census 2021, G04 Age by sex, Main Statistical Areas Level 2 and up (SA2+)>}
--- <class 'pandasdmx.model.CategoryScheme'> ---
{63: <CategoryScheme ABS:C21_ASGS(1.0.0) (5 items): Census 2021>, 64: <CategoryScheme ABS:C21_ASGS(1.0.0) (1 items)>}
--- <class 'pandasdmx.model.Categorisation'> ---
{'CAT_C21_G04_SA2': <Categorisation ABS:CAT_C21_G04_SA2(1.0.0): Census 2021, G04 Age by sex, Main Statistical Areas Level 2 and up (SA2+) Categorisation>}
--- <class 'pandasdmx.model.Codelist'> ---
{'CL_ASGS_2021': <Codelist ABS:CL_ASGS_2021(1.0.0) (2985 items): Australian Statistical Geography Standard (ASGS) Edition 3 - Main Structure>, 'CL_C21_AGEINGP13': <Codelist ABS:CL_C21_AGEINGP13(1.0.0) (102 items): Age, excludes overseas vistitors 13>, 'CL_C21_SEXP01': <Codelist ABS:CL_C21_SEXP01(1.0.0) (3 items): Sex 01>, 'CL_REGION_TYPE': <Codelist ABS:CL_REGION_TYPE(1.0.0) (43 items): Region Type>, 'CL_STATE': <Codelist ABS:CL_STATE(1.0.0) (10 items): State>}
--- <class 'pandasdmx.model.ConceptScheme'> ---
{52106: <ConceptScheme ABS:CS_C21_PERSON(1.0.0) (120 items): Census 2021 Person Concepts>, 52107: <ConceptScheme ABS:CS_C21_PERSON(1.0.0) (1 items)>, 'CS_C21_PERSON': <ConceptScheme ABS:CS_C21_PERSON(1.0.0) (1 items)>, 52118: <ConceptScheme ABS:CS_GEOGRAPHY(1.0.0) (25 items): Geography Concepts>, 52119: <ConceptScheme ABS:CS_GEOGRAPHY(1.0.0) (1 items)>, 'CS_GEOGRAPHY': <ConceptScheme ABS:CS_GEOGRAPHY(1.0.0) (2 items)>, 52134: <ConceptScheme ABS:CS_COMMON(1.0.0) (5 items): Common Concepts>, 52135: <ConceptScheme ABS:CS_COMMON(1.0.0) (1 items)>, 'CS_COMMON': <ConceptScheme ABS:CS_COMMON(1.0.0) (1 items)>}
--- <class 'pandasdmx.model.Annotation'> ---
{'obs_count': Annotation(id='obs_count', title='912186', type='sdmx_metrics', url=None, text=), 52148: Annotation(id=None, title='A', type='ReleaseVersion', url=None, text=)}
--- Name ---
{52149: ('en', 'Availability (A) for C21_G04_SA2')}
--- <class 'pandasdmx.reader.sdmxml.Reference'> ---
{'C21_G04_SA2': <pandasdmx.reader.sdmxml.Reference object at 0x7fdd3198b970>}
--- <class 'pandasdmx.model.MemberSelection'> ---
{52253: <MemberSelection AGEINGP in {'_T', '0', '0_4', '1', '10', '10_14', '11', '12', '13', '14', '15', '15_19', '16', '17', '18', '19', '2', '20', '20_24', '21', '22', '23', '24', '25', '25_29', '26', '27', '28', '29', '3', '30', '30_34', '31', '32', '33', '34', '35', '35_39', '36', '37', '38', '39', '4', '40', '40_44', '41', '42', '43', '44', '45', '45_49', '46', '47', '48', '49', '5', '5_9', '50', '50_54', '51', '52', '53', '54', '55', '55_59', '56', '57', '58', '59', '6', '60', '60_64', '61', '62', '63', '64', '65', '65_69', '66', '67', '68', '69', '7', '70', '70_74', '71', '72', '73', '74', '75', '75_79', '76', '77', '78', '79', '8', '80_84', '85_89', '9', '90_94', '95_99', 'GE100'}>, 52257: <MemberSelection SEXP in {'1', '2', '3'}>, 55239: <MemberSelection REGION in {'1', '101', ...<truncated>..., '9OTER', 'AUS'}>, 55246: <MemberSelection REGION_TYPE in {'AUS', 'GCCSA', 'SA2', 'SA3', 'SA4', 'STE'}>, 55257: <MemberSelection STATE in {'1', '2', '3', '4', '5', '6', '7', '8', '9', 'AUS'}>}
--- <class 'pandasdmx.model.RangePeriod'> ---
{55260: RangePeriod(start=Period(is_inclusive=True, period=datetime.datetime(2021, 1, 1, 0, 0)), end=Period(is_inclusive=True, period=datetime.datetime(2021, 12, 31, 0, 0)))}
<common:KeyValue xmlns:common="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common" xmlns:message="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message" xmlns:structure="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/structure" id="TIME_PERIOD">
<common:TimeRange/></common:KeyValue>
Traceback (most recent call last):
File "../venv/lib/python3.10/site-packages/pandasdmx/reader/sdmxml.py", line 299, in read_message
result = func(self, element)
File "../venv/lib/python3.10/site-packages/pandasdmx/reader/sdmxml.py", line 1189, in _ms
raise RuntimeError
RuntimeError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "../main.py", line 17, in <module>
example_msg = abs_xml.dataflow(resource=flow_msg.dataflow.C21_G04_SA2) # get dataflow for C21_G04_SA2
File "..r/venv/lib/python3.10/site-packages/pandasdmx/api.py", line 514, in get
msg = reader.read_message(response_content, dsd=kwargs.get("dsd", None))
File "../venv/lib/python3.10/site-packages/pandasdmx/reader/sdmxml.py", line 316, in read_message
raise XMLParseError from exc
pandasdmx.exceptions.XMLParseError: RuntimeError
Using Python 3.11.7 pandasdmx 1.10.0
I am getting an XMLParseError while attempting to get data using a dictionary key from "ABS_XML".
Traceback
``` [c:\Users\timot\anaconda3\envs\SDMX\Lib\site-packages\pandasdmx\remote.py:11](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/remote.py:11): RuntimeWarning: optional dependency requests_cache is not installed; cache options to Session() have no effect warn( --- SS without DSD --- {1: False} ---The error looks to occur when trying to get the dsd structure information.
Specifying references=descendants and then using the information returned, allows the data request to complete successfully.
My main suspicion would be parsing the 2 content constraints returned from, https://api.data.abs.gov.au/dataflow/ABS/ABS_ANNUAL_ERP_LGA2022/latest?references=all. These are automatically generated during a point in time release, https://sis-cc.gitlab.io/dotstatsuite-documentation/using-api/embargo-management/#point-in-time-release-feature