Closed geoffwright240 closed 9 years ago
For those that are looking at using filters in request, the following code shows an example.
from pandasdmx import client estat = client("Eurostat") estat.get_data("namq_10_gdp",'.CLV10_MEUR.SWDA.B1GQ.DE', concat = False)
First argument is the flowref, second argument is the "key". This is where you add filters to your request.
A useful website for those looking to learn more about Eurostat's web services can be found here:
http://ec.europa.eu/eurostat/web/sdmx-web-services/data-struct-def
and
http://ec.europa.eu/eurostat/web/sdmx-web-services/a-few-useful-points
Thanks. You are right: This error is not caught although it should be. And it will be in v0.2 (see below).
I have embarked on a complete rewrite. v0.1.2 was mainly a proof of concept. I have since liaised with the creators of SDMX who gave me some advice. End November I decided to rewrite the whole thing. I have looked into configuration and type validation through IPython.utils.traitlets and IPython.config.configurable, application-wide logging etc.
I have dropped the SQLite stuff. It will be replaced by xpath queries on the XML document. At the heart of the new architecture there will be a pythonic representation of the SDMX information model (see Part 2 of the SDMX2.1 Standard and the user guide on www.sdmx.org). The information model is separate from the code relating to XML, JSON etc. in the reader package, and pandas, xray, Excel, SQLAlchemy, smoke signals etc. in the writer package. The information model classes call reader methods which execute xpath queries on the xml document. These method calls are wrapped in properties.
The code is evolving but far from stable. There is an incredible amount of work remaining. But I believe the goal is worth the effort. Some one has to do it, not necessarily myself.
On v0.2dev: The information model in model.py is heavily under construction. There is a Message class representing an SDMX message. It will allow error detection after downloading.
Please let me know any thoughts you might have.
Leo
Am 02.01.2015 um 02:09 schrieb geoffwright240:
With some queries, the sdmx server responds with the following message ...
footer:FooterResponse too large due to client request/common:TextQuery size exceeds maximum limit (1000000 entries)./common:Text/footer:Message/footer:Footer
Generic setup
estat = client('Eurostat', 'eurodb.db') db = estat.get_dataflows()
gdp_table = db.execute('SELECT * FROM ESTAT_dataflows WHERE title LIKE "%GDP%"') gdp_list = gdp_table.fetchall()
bop = gdp_list[1] df, md = estat.get_data(bop, '', concat = True) # Request is too large and this call fails silently.
In theory, the filters parameter in the get_data function should help reduce the size of the request.
— Reply to this email directly or view it on GitHub https://github.com/dr-leo/pandaSDMX/issues/4.
Thanks. Very useful.
The forthcoming v0.2 will support filter creation pythoniccally rather than through a bare function argument. Any help is much appreciated. To get a first impression check out the dev versions regularly. But don't expect anything to work as expected. Currently I am working on the pythonic representation of structures such as codelists and conceptschemes. Datastructure-Definitions will follow, only thereafter datasets.
Am 02.01.2015 um 14:27 schrieb geoffwright240:
For those that are looking at using filters in request, the following code shows an example.
from pandasdmx import client estat = client("Eurostat") estat.get_data("namq_10_gdp",'.CLV10_MEUR.SWDA.B1GQ.DE', concat = False)
First argument is the flowref, second argument is the "key". This is where you add filters to your request.
A useful website for those looking to learn more about Eurostat's web services can be found here:
http://ec.europa.eu/eurostat/web/sdmx-web-services/data-struct-def
and
http://ec.europa.eu/eurostat/web/sdmx-web-services/a-few-useful-points
— Reply to this email directly or view it on GitHub https://github.com/dr-leo/pandaSDMX/issues/4#issuecomment-68526378.
0.2dev looks very promising … although I don’t see model.py in the repo yet.
I was extremely happy to see this though as I was just in the process of trying to figure out how to solve this issue. I almost wonder if this bit of code should be in pandas as opposed to pandaSDMX. I also wonder what the best way to handle this is … the time period we are talking about is actually a quarter so not totally correct to convert to first month. (I wonder how other languages treat this. I recall from many years ago that FAME had a quarter time period defined.)
time_spans = { 'Q1' : '01-01', 'Q2' : '04-01', 'Q3' : '07-01', 'Q4' : '10-01', 'S1' : '01-01', 'S2' : '07-01' }
Geoff
On Jan 2, 2015, at 4:22 PM, dr-leo notifications@github.com wrote:
Thanks. Very useful.
The forthcoming v0.2 will support filter creation pythoniccally rather than through a bare function argument. Any help is much appreciated. To get a first impression check out the dev versions regularly. But don't expect anything to work as expected. Currently I am working on the pythonic representation of structures such as codelists and conceptschemes. Datastructure-Definitions will follow, only thereafter datasets.
Am 02.01.2015 um 14:27 schrieb geoffwright240:
For those that are looking at using filters in request, the following code shows an example.
from pandasdmx import client estat = client("Eurostat") estat.get_data("namq_10_gdp",'.CLV10_MEUR.SWDA.B1GQ.DE', concat = False)
First argument is the flowref, second argument is the "key". This is where you add filters to your request.
A useful website for those looking to learn more about Eurostat's web services can be found here:
http://ec.europa.eu/eurostat/web/sdmx-web-services/data-struct-def
and
http://ec.europa.eu/eurostat/web/sdmx-web-services/a-few-useful-points
— Reply to this email directly or view it on GitHub https://github.com/dr-leo/pandaSDMX/issues/4#issuecomment-68526378.
— Reply to this email directly or view it on GitHub https://github.com/dr-leo/pandaSDMX/issues/4#issuecomment-68563571.
I double-checked that model.py is tracked. It should be on github for quite a while. Have you updated to tip? the v0.2dev branch?
On pandas time spans: Agreed. I wanted to open an issue on pydata/pandas but I wasn't entirely sure that I hadn't overlooked anything. There is so much debate on time spans on the pydata mailing list... I take your comment as a nudge to raise this question on the list. Leo Am 03.01.2015 um 01:22 schrieb geoffwright240:
0.2dev looks very promising … although I don’t see model.py in the repo yet.
I was extremely happy to see this though as I was just in the process of trying to figure out how to solve this issue. I almost wonder if this bit of code should be in pandas as opposed to pandaSDMX. I also wonder what the best way to handle this is … the time period we are talking about is actually a quarter so not totally correct to convert to first month. (I wonder how other languages treat this. I recall from many years ago that FAME had a quarter time period defined.)
Time span conversions not recognised by pandas:
time_spans = { 'Q1' : '01-01', 'Q2' : '04-01', 'Q3' : '07-01', 'Q4' : '10-01', 'S1' : '01-01', 'S2' : '07-01' }
Geoff
On Jan 2, 2015, at 4:22 PM, dr-leo notifications@github.com wrote:
Thanks. Very useful.
The forthcoming v0.2 will support filter creation pythoniccally rather than through a bare function argument. Any help is much appreciated. To get a first impression check out the dev versions regularly. But don't expect anything to work as expected. Currently I am working on the pythonic representation of structures such as codelists and conceptschemes. Datastructure-Definitions will follow, only thereafter datasets.
Am 02.01.2015 um 14:27 schrieb geoffwright240:
For those that are looking at using filters in request, the following code shows an example.
from pandasdmx import client estat = client("Eurostat") estat.get_data("namq_10_gdp",'.CLV10_MEUR.SWDA.B1GQ.DE', concat = False)
First argument is the flowref, second argument is the "key". This is where you add filters to your request.
A useful website for those looking to learn more about Eurostat's web services can be found here:
http://ec.europa.eu/eurostat/web/sdmx-web-services/data-struct-def
and
http://ec.europa.eu/eurostat/web/sdmx-web-services/a-few-useful-points
— Reply to this email directly or view it on GitHub https://github.com/dr-leo/pandaSDMX/issues/4#issuecomment-68526378.
— Reply to this email directly or view it on GitHub https://github.com/dr-leo/pandaSDMX/issues/4#issuecomment-68563571.
— Reply to this email directly or view it on GitHub https://github.com/dr-leo/pandaSDMX/issues/4#issuecomment-68575639.
Leo,
Sorry. My mistake — yes, model.py is in v0.2dev branch. Also, don’t let me slow you down as there is a ton of learning I need to do before being able to contribute in a meaningful way.
I read http://pandas.pydata.org/pandas-docs/stable/timeseries.html more carefully and think that using a Period is the way to go for handling quarterly time series that are returned by Eurostat. (There could be something more but this looks like a decent enough solution for now.) p = Period('2012Q1', freq='Q-DEC') What would be the best way for me to help contribute ? Once v0.2dev is close, I could help with the documentation. Also, would it make sense for me to write a couple of unit tests ? Not too sure exactly what this entails but I could dig into it if that makes the most sense.
If you are interested in my motivation for the pandaSDMX project, I am looking at this is to integrate links to SDMX web services into a little tool that I am building for Excel. I am not a huge fan of Excel but it is so ubiquitous (and what I use at my job that pays the mortgage) so there is a huge user community there. The tool is open source and I would like to eventually port to Openoffice and gnumeric. My project is very rudimentary at this stage but it generally works as hoped. I have not put it up on Github yet as the code is way too premature to have in the public.
Once complete, a user will only have to put the following function into a MS Excel cell and it will pull a time series into your spreadsheet.
=DataReader(series:‘GDP’,db:’fred’,startdate:’2000-01-01’,enddate:’2014-01-01’,showdates:’True’,...)
This is similar to functionality you can get through Bloomberg’s BDH, SunGard’s FAMEPopulator and S&P’s CapitalIQ but you are not restricted to their proprietary databases (which are in large part just aggregators for public data anyways.)
My approach only works on Mac currently as the implementation of xlwings is not async and therefore gets trapped by limitations on Microsofts com restrictions. The guys behind xlwings have a fix for this that they are currently implementing.
Anyways, not to distract you from your projects at hand, but if you are interested here is a little bundle of code and a video to give you an idea of what I am trying to do. (It’s probably just easiest to watch the video but happy to take any comments you have on the code.)
http://wrightville.org/rel/xlwings_dl.zip
All the best Leo !
Geoff
Le 2015-01-03 à 03:11, dr-leo notifications@github.com a écrit :
I double-checked that model.py is tracked. It should be on github for quite a while. Have you updated to tip? the v0.2dev branch?
On pandas time spans: Agreed. I wanted to open an issue on pydata/pandas but I wasn't entirely sure that I hadn't overlooked anything. There is so much debate on time spans on the pydata mailing list... I take your comment as a nudge to raise this question on the list. Leo Am 03.01.2015 um 01:22 schrieb geoffwright240:
0.2dev looks very promising … although I don’t see model.py in the repo yet.
I was extremely happy to see this though as I was just in the process of trying to figure out how to solve this issue. I almost wonder if this bit of code should be in pandas as opposed to pandaSDMX. I also wonder what the best way to handle this is … the time period we are talking about is actually a quarter so not totally correct to convert to first month. (I wonder how other languages treat this. I recall from many years ago that FAME had a quarter time period defined.)
Time span conversions not recognised by pandas:
time_spans = { 'Q1' : '01-01', 'Q2' : '04-01', 'Q3' : '07-01', 'Q4' : '10-01', 'S1' : '01-01', 'S2' : '07-01' }
Geoff
On Jan 2, 2015, at 4:22 PM, dr-leo notifications@github.com wrote:
Thanks. Very useful.
The forthcoming v0.2 will support filter creation pythoniccally rather than through a bare function argument. Any help is much appreciated. To get a first impression check out the dev versions regularly. But don't expect anything to work as expected. Currently I am working on the pythonic representation of structures such as codelists and conceptschemes. Datastructure-Definitions will follow, only thereafter datasets.
Am 02.01.2015 um 14:27 schrieb geoffwright240:
For those that are looking at using filters in request, the following code shows an example.
from pandasdmx import client estat = client("Eurostat") estat.get_data("namq_10_gdp",'.CLV10_MEUR.SWDA.B1GQ.DE', concat = False)
First argument is the flowref, second argument is the "key". This is where you add filters to your request.
A useful website for those looking to learn more about Eurostat's web services can be found here:
http://ec.europa.eu/eurostat/web/sdmx-web-services/data-struct-def
and
http://ec.europa.eu/eurostat/web/sdmx-web-services/a-few-useful-points
— Reply to this email directly or view it on GitHub https://github.com/dr-leo/pandaSDMX/issues/4#issuecomment-68526378.
— Reply to this email directly or view it on GitHub https://github.com/dr-leo/pandaSDMX/issues/4#issuecomment-68563571.
— Reply to this email directly or view it on GitHub https://github.com/dr-leo/pandaSDMX/issues/4#issuecomment-68575639.
— Reply to this email directly or view it on GitHub.
Dear Jeff,
I am very happy about your interest in pandaSDMX. Apparently Excel users including myself could greatly benefit from your connector. Once pandaSDMX has matured, you may want to focus on a writer for Excel if you feel the detour through pandas has too many pitfalls.
In order to get a good understanding of XML within the SDMX context, you want to make sure to be able to read XML documents. Eclipse seems like a good environment for this, but there are many IDE's out there.
Anyway, time spans are details for later. the next goal is to get the Message class in the model right. I'll have to read through the SDMX-ML specification (= the XML representation for SDMX) to find out about the different message types, attributes and children and create a pythonic API for that. This is no rocket science. But it requires good intuition for usability and the most concise and elegant way of expressing the structures. I recommend looking at the pandasdmx.agency.Agency class as well as the utils.HasItems class (the latter being under construction). I am working on it. I hope to allow indexing and dict-like attribute access on XML attributes and nodes, all through properties. Don't know if this is too fancy... I'll commit the latest ideas to github now. I think I have a decent implementation for ItemSchemes such as Codelists and ConceptSchemes. Extending it to CategorySchemes and Dataflows will be very easy. But the DataStructureDefinition itself with all those datatypes (facets) will be a bit of a challenge I think. And then constraints...
So much for now. Please ignore from the above whatever bores you or looks too messy.
Again, I am very glad you are interested in my little pet project and I am hopeful we can both benefit from each other's skills and efforts.
Leo
Am 03.01.2015 um 15:11 schrieb geoffwright240:
Leo,
Sorry. My mistake — yes, model.py is in v0.2dev branch. Also, don’t let me slow you down as there is a ton of learning I need to do before being able to contribute in a meaningful way.
I read http://pandas.pydata.org/pandas-docs/stable/timeseries.html more carefully and think that using a Period is the way to go for handling quarterly time series that are returned by Eurostat. (There could be something more but this looks like a decent enough solution for now.) p = Period('2012Q1', freq='Q-DEC') What would be the best way for me to help contribute ? Once v0.2dev is close, I could help with the documentation. Also, would it make sense for me to write a couple of unit tests ? Not too sure exactly what this entails but I could dig into it if that makes the most sense.
If you are interested in my motivation for the pandaSDMX project, I am looking at this is to integrate links to SDMX web services into a little tool that I am building for Excel. I am not a huge fan of Excel but it is so ubiquitous (and what I use at my job that pays the mortgage) so there is a huge user community there. The tool is open source and I would like to eventually port to Openoffice and gnumeric. My project is very rudimentary at this stage but it generally works as hoped. I have not put it up on Github yet as the code is way too premature to have in the public.
Once complete, a user will only have to put the following function into a MS Excel cell and it will pull a time series into your spreadsheet.
=DataReader(series:‘GDP’,db:’fred’,startdate:’2000-01-01’,enddate:’2014-01-01’,showdates:’True’,...)
This is similar to functionality you can get through Bloomberg’s BDH, SunGard’s FAMEPopulator and S&P’s CapitalIQ but you are not restricted to their proprietary databases (which are in large part just aggregators for public data anyways.)
My approach only works on Mac currently as the implementation of xlwings is not async and therefore gets trapped by limitations on Microsofts com restrictions. The guys behind xlwings have a fix for this that they are currently implementing.
Anyways, not to distract you from your projects at hand, but if you are interested here is a little bundle of code and a video to give you an idea of what I am trying to do. (It’s probably just easiest to watch the video but happy to take any comments you have on the code.)
http://wrightville.org/rel/xlwings_dl.zip
All the best Leo !
Geoff
Le 2015-01-03 à 03:11, dr-leo notifications@github.com a écrit :
I double-checked that model.py is tracked. It should be on github for quite a while. Have you updated to tip? the v0.2dev branch?
On pandas time spans: Agreed. I wanted to open an issue on pydata/pandas but I wasn't entirely sure that I hadn't overlooked anything. There is so much debate on time spans on the pydata mailing list... I take your comment as a nudge to raise this question on the list. Leo Am 03.01.2015 um 01:22 schrieb geoffwright240:
0.2dev looks very promising … although I don’t see model.py in the repo yet.
I was extremely happy to see this though as I was just in the process of trying to figure out how to solve this issue. I almost wonder if this bit of code should be in pandas as opposed to pandaSDMX. I also wonder what the best way to handle this is … the time period we are talking about is actually a quarter so not totally correct to convert to first month. (I wonder how other languages treat this. I recall from many years ago that FAME had a quarter time period defined.)
Time span conversions not recognised by pandas:
time_spans = { 'Q1' : '01-01', 'Q2' : '04-01', 'Q3' : '07-01', 'Q4' : '10-01', 'S1' : '01-01', 'S2' : '07-01' }
Geoff
On Jan 2, 2015, at 4:22 PM, dr-leo notifications@github.com wrote:
Thanks. Very useful.
The forthcoming v0.2 will support filter creation pythoniccally rather than through a bare function argument. Any help is much appreciated. To get a first impression check out the dev versions regularly. But don't expect anything to work as expected. Currently I am working on the pythonic representation of structures such as codelists and conceptschemes. Datastructure-Definitions will follow, only thereafter datasets.
Am 02.01.2015 um 14:27 schrieb geoffwright240:
For those that are looking at using filters in request, the following code shows an example.
from pandasdmx import client estat = client("Eurostat") estat.get_data("namq_10_gdp",'.CLV10_MEUR.SWDA.B1GQ.DE', concat = False)
First argument is the flowref, second argument is the "key". This is where you add filters to your request.
A useful website for those looking to learn more about Eurostat's web services can be found here:
http://ec.europa.eu/eurostat/web/sdmx-web-services/data-struct-def
and
http://ec.europa.eu/eurostat/web/sdmx-web-services/a-few-useful-points
— Reply to this email directly or view it on GitHub
https://github.com/dr-leo/pandaSDMX/issues/4#issuecomment-68526378.
— Reply to this email directly or view it on GitHub https://github.com/dr-leo/pandaSDMX/issues/4#issuecomment-68563571.
— Reply to this email directly or view it on GitHub https://github.com/dr-leo/pandaSDMX/issues/4#issuecomment-68575639.
— Reply to this email directly or view it on GitHub.
— Reply to this email directly or view it on GitHub https://github.com/dr-leo/pandaSDMX/issues/4#issuecomment-68596035.
Fixed in v0.2.0. v0.2.1 will add native support to handle the zip files returned by Eurostat.
With some queries, the sdmx server responds with the following message ...
Message code="413" Response too large due to client request Query size exceeds maximum limit (1000000 entries).
-- Code to replicate error -- estat = client('Eurostat', 'eurodb.db') db = estat.get_dataflows()
gdp_table = db.execute('SELECT * FROM ESTAT_dataflows WHERE title LIKE "%GDP%"') gdp_list = gdp_table.fetchall()
bop = gdp_list[1] df, md = estat.get_data(bop, '', concat = True) # Request is too large and this call fails silently.
In theory, the filters parameter in the get_data function should help reduce the size of the request.