LenkaV / CIF

Composite Indicators Framework for Business Cycle Analysis
GNU General Public License v3.0
56 stars 32 forks source link

I'm a little confused #21

Closed Yigan321 closed 3 years ago

Yigan321 commented 3 years ago

Hello Im hoping you can help me I am trying to get Inflation CPI from OECD API, I know the data source is MEI when I use your code to try and create the link the output is not matching the csv file I download from the actual website. Also is it possible to add more than 1 country or do is it only possible to create a link separately for each desired country

LenkaV commented 3 years ago

Hi,

1) Is it possible to add more than 1 country? Yes, definitely. Just add the list of desired countries into the createDataFrameFromOECD function:

from cif import cif
import pandas as pd

countries = ['CZE', 'AUT', 'DEU', 'POL', 'SVK'] # Select input data countries
data_all, subjects_all, measures_all = cif.createDataFrameFromOECD(countries = countries, dsname = 'MEI', subject = ['CPALTT01'], measure = ['GP'], frequency = 'A')

# optionally get rid of multiindex in pandas DataFrame and get index values formatted as datetime
if data_all.shape[0] > 0:
    data = cif.getRidOfMultiindex(df = data_all)
    data = cif.getIndexAsDate(data)

2) The output is not matching the csv file you download from the actual website. I'm not really sure, what you mean by "not matching". Could you please add some example? The website uri, the csv file, and the code snippet using cif? Otherwise, I can give you only some general advices: a) Make sure, that you are requesting the same database ('MEI' in the example above). b) Make sure, that you are requesting the same subject ('CPALTT01' in the example above stands for 'Consumer Price Index > All items > Total > Total', but OECD offers many other "versions" of CPI, so this part may be tricky). c) Make sure, that you are requesting the same measure ('GP' in the example above stands for 'Growth rate previous period'. Quite tricky, again). d) Make sure, that you are requesting the same frequency ('A' in the example above stands for 'Annual') e) Don't be surprised, that Annual data are available with different set of measures than Quarterly or Monthly data. If you are not sure, which codes to use, you can easily check structure of the data set:

cif.getOECDJSONStructure(dsname = 'MEI', showValues = [0]) # available locations
cif.getOECDJSONStructure(dsname = 'MEI', showValues = [3]) # available frequencies

3) Let me just kindly remind you, that it is better, when creating name of the issue, to choose something which describes the content of the issue. Something as generic as 'I'm a little confused' is not really informative. Probably any issue in any git hub repo could be named like this, but it wouldn't be very convenient for developers and future users with similar questions.

Please, let me know whether this helped. If not, please provide me with additional information.

Yigan321 commented 3 years ago

Thank you for you response, my only other question is where are the definitions for the subject , and measures. For example how would I know CPALTTo1 is Consumer Price index is there a list that shows all these definitions

On Tue, Nov 3, 2020 at 5:30 PM LenkaV notifications@github.com wrote:

Hi,

  1. Is it possible to add more than 1 country? Yes, definitely. Just add the list of desired countries into the createDataFrameFromOECD function:

from cif import cif import pandas as pd

countries = ['CZE', 'AUT', 'DEU', 'POL', 'SVK'] # Select input data countries data_all, subjects_all, measures_all = cif.createDataFrameFromOECD(countries = countries, dsname = 'MEI', subject = ['CPALTT01'], measure = ['GP'], frequency = 'A')

optionally get rid of multiindex in pandas DataFrame and get index values formatted as datetime

if data_all.shape[0] > 0: data = cif.getRidOfMultiindex(df = data_all) data = cif.getIndexAsDate(data)

  1. The output is not matching the csv file you download from the actual website. I'm not really sure, what you mean by "not matching". Could you please add some example? The website uri, the csv file, and the code snippet using cif? Otherwise, I can give you only some general advices: a) Make sure, that you are requesting the same database ('MEI' in the example above). b) Make sure, that you are requesting the same subject ('CPALTT01' in the example above stands for 'Consumer Price Index > All items > Total > Total', but OECD offers many other "versions" of CPI, so this part may be tricky). c) Make sure, that you are requesting the same measure ('GP' in the example above stands for 'Growth rate previous period'. Quite tricky, again). d) Make sure, that you are requesting the same frequency ('A' in the example above stands for 'Annual') e) Don't be surprised, that Annual data are available with different set of measures than Quarterly or Monthly data. If you are not sure, which codes to use, you can easily check structure of the data set:

cif.getOECDJSONStructure(dsname = 'MEI', showValues = [0]) # available locations cif.getOECDJSONStructure(dsname = 'MEI', showValues = [3]) # available frequencies

  1. Let me just kindly remind you, that it is better, when creating name of the issue, to choose something which describes the content of the issue. Something as generic as 'I'm a little confused' is not really informative. Probably any issue in any git hub repo could be named like this, but it wouldn't be very convenient for developers and future users with similar questions.

Please, let me know whether this helped. If not, please provide me with additional information.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/LenkaV/CIF/issues/21#issuecomment-721407927, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORTWHRUKL4TIA3LCCUBVCTSOCABJANCNFSM4TJHYWTQ .

LenkaV commented 3 years ago

You can use the cif function cif.getOECDJSONStructure mentioned above. For example, to get the list of all available subjects in MEI database:

cif.getOECDJSONStructure(dsname = 'MEI', showValues = [1])

0 for LOCATION 1 for SUBJECT 2 for MEASURE 3 for FREQUENCY 4 for TIME_PERIOD

Or slightly adjust the original code:

data_all, subjects_all, measures_all = cif.createDataFrameFromOECD(countries = countries, dsname = 'MEI', frequency = 'A')

The subject = ['CPALTT01'], measure = ['GP'] part is missing now and the DataFrame subjects_all contains codes and labels of all available subjects.

For more details see also this issue which is very similar to your question.

I would also recommend to check OECD websites, as the codes and labels are provided by them.

Yigan321 commented 3 years ago

Ok perfect I understand and Im sorry one more question from your github I dont see any option to have all countries into one api unit is it not possible?

On Wed, Nov 4, 2020 at 2:45 PM LenkaV notifications@github.com wrote:

You can use the cif function cif.getOECDJSONStructure mentioned above. For example, to get the list of all available subjects in MEI database:

cif.getOECDJSONStructure(dsname = 'MEI', showValues = [1])

0 for LOCATION 1 for SUBJECT 2 for MEASURE 3 for FREQUENCY 4 for TIME_PERIOD

Or slightly adjust the original code:

data_all, subjects_all, measures_all = cif.createDataFrameFromOECD(countries = countries, dsname = 'MEI', frequency = 'A')

The subject = ['CPALTT01'], measure = ['GP'] part is missing now and the DataFrame subjects_all contains codes and labels of all available subjects.

For more details see also this issue https://github.com/LenkaV/CIF/issues/19#issuecomment-624486195 which is very similar to your question.

I would also recommend to check OECD websites, as the codes and labels are provided by them.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/LenkaV/CIF/issues/21#issuecomment-721937392, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORTWHWHERXYJPZ7U5DLBOTSOGVMPANCNFSM4TJHYWTQ .

LenkaV commented 3 years ago

No, this is not possible directly. I don't even intend to include this option because there is a response limit (maximum 1 000 000 observations for 1 request if I remember correctly) in OECD API. Therefore you wouldn't probably be able to download the whole database at once anyway.

However, it is quite easy to get the complete list of countries:

structure = cif.getOECDJSONStructure(dsname = 'MEI', returnValues = True)
countries_all = [i['id'] for i in structure[0]['values']]

Simply use the new countries_all list instead of manually created countries list in function cif.createDataFrameFromOECD in the original code snippet.

Yigan321 commented 3 years ago

Perfect in order to get specific time_period just add time_period to the function and input the dates correct?

On Wed, Nov 4, 2020 at 4:00 PM LenkaV notifications@github.com wrote:

No, this is not possible directly. I don't even intend to include this option because there is a response limit (maximum 1 000 000 observations for 1 request if I remember correctly) in OECD API. Therefore you wouldn't probably be able to download the whole database at once anyway.

However, it is quite easy to get the complete list of countries:

structure = cif.getOECDJSONStructure(dsname = 'MEI', returnValues = True) countries_all = [i['id'] for i in structure[0]['values']]

Simply use the new countries_all list instead of manually created countries list in function cif.createDataFrameFromOECD in the original code snippet.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/LenkaV/CIF/issues/21#issuecomment-721972175, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORTWHSU557OROJLHYSMGSLSOG6FLANCNFSM4TJHYWTQ .

LenkaV commented 3 years ago

Exactly. You may check the description of the parameters and its expected values in the docstrings:

help(cif.createDataFrameFromOECD)

To adjust time period use parameters:

startDate: str or None
    date in YYYY-MM (2000-01) or YYYY-QQ (2000-Q1) format, None for all observations
endDate: str or None
    date in YYYY-MM (2000-01) or YYYY-QQ (2000-Q1) format, None for all observations
Yigan321 commented 3 years ago

Hello Im trying to parse the api links into HDFS transforming the json into a dataframe but its coming out like this [image: Screen Shot 2020-12-02 at 11.36.35 AM.png]

On Mon, Nov 9, 2020 at 3:19 AM LenkaV notifications@github.com wrote:

Closed #21 https://github.com/LenkaV/CIF/issues/21.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/LenkaV/CIF/issues/21#event-3972425699, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORTWHUOYBUIUJNVJUWTYMTSO6QYVANCNFSM4TJHYWTQ .

LenkaV commented 3 years ago

Hi, could you please try to attach the image again?