OliverSherouse / wbdata

A python library for accessing world bank data
GNU General Public License v2.0
182 stars 55 forks source link

Monthly data_date #5

Closed geoffwright240 closed 4 years ago

geoffwright240 commented 9 years ago

Some indicators allow monthly date to be queried. For example, the call below is valid.

http://api.worldbank.org/country/1W/indicator/IRON_ORE_SPOT?date=2009M01:2013M13

wbdata.get_data() only allows annual data to be queried.

if data_date: if type(data_date) is tuple: data_date_str = ":".join((i.strftime("%Y") for i in data_date)) args.append(("date", data_date_str)) else: args.append(("date", data_date.strftime("%Y")))

It is not clear to me how to differentiate between a user wanting annual series or a monthly series when using a datetime object. Should an additional parameter be passed to get_data ? Should you scrap the datetime requirement to the data_date parameter and just have the user pass a 2-tuple of strings ("2009","2011") and ("2009M01","2010M02") ?

Not exactly sure where this may fit into your thinking but if you look at:

http://databank.worldbank.org/data/views/variableselection/selectvariables.aspx?source=global-economic-monitor-(gem)-commodities

and click on the time dimension, there is a filter for "monthly" and "annual" in the leftmost pane. Thanks for this great library! Happy to discuss.

OliverSherouse commented 9 years ago

Thanks for the bug report! The history here is: the library used to handle dates differently for things like this, but it was buggy and the vast majority of series didn't seem to use it. I've been thinking about re-doing the guts of a few different parts of the library, and this should really be first among them.

For a while now I've questioned whether I made the right call in having the data_date arguments be datetime objects at all, but at the very least there should be a switch for what frequency people want. I'll try to add that in this week or next.

geoffwright240 commented 9 years ago

Thanks for the quick response Oliver — this is a tough one and speaks to the difficulty in choosing the best way to represent this parameter. It looks like you are not alone in struggling with how best to approach this. I took a quick look at the R library WDI and it looks like Vincent’s approach is similar to yours in that they ignore dates that are not years. I believe this is also the case for Vincent’s pandas io implementation as well. Matt Duck’s wbpy also ignores dates that are not years.

The ruby gem from Justin Stoller seems to have taken your original approach and lets the user provide different strings. I am not a ruby guy so not sure how robust this is. I am pasting the code block below and hope that it shows up ok.

the date param is expressed as a range 'StartDate:EndDate'

# Date may be year & month, year & quarter, or just year
# Date will convert any Date/Time object to an apporpriate year & month
#
def dates(date_range)
  if date_range.is_a? String
    # assume the user knows what she is doing if passed a string
    @query[:params][:date] = date_range
  elsif date_range.is_a? Range
    start = date_range.first
    last = date_range.last
    @query[:params][:date] = "#{start.strftime("%YM%m")}:#{last.strftime("%YM%m")}"
  end
  self
end

I think this is a similar problem to how dates are represented on Eurostat so I will dig around to see if there are any novel ways of dealing with it.

Kind regards,

Geoff

Le 2015-02-09 à 15:22, Oliver Sherouse notifications@github.com a écrit :

Thanks for the bug report! The history here is: the library used to handle dates differently for things like this, but it was buggy and the vast majority of series didn't seem to use it. I've been thinking about re-doing the guts of a few different parts of the library, and this should really be first among them.

For a while now I've questioned whether I made the right call in having the data_date arguments be datetime objects at all, but at the very least there should be a switch for what frequency people want. I'll try to add that in this week or next.

— Reply to this email directly or view it on GitHub.

OliverSherouse commented 7 years ago

My current thinking on this is:

  1. I can go back to using integers for the years and let the user handle slicing the datetimes as they so choose.
  2. I can pull all the data for those years, and then compare and myself. This would require parsing the dates even if the user doesn't want that. in the output, which would be annoying.

I'm leaning towards option 1, but could be convinced.