bquast / wiod

Data sets from the World Input Output database, for the years 1995-2011
11 stars 7 forks source link

New wiod tables #2

Open monteforest opened 7 years ago

monteforest commented 7 years ago

There's seems to be updated tables up to 2014. http://www.wiod.org/database/wiots16

Any chance this package will be updated with the new data?

bquast commented 7 years ago

@zauster is working on this

zauster commented 7 years ago

Hey, sorry for the long wait, january was full of work. I have my stuff ready. The WIOTs will be loaded as lists of matrices/vectors.

Do you think we should adopt this format for the old WIOTs too? That would make the global environment less cluttered with a lot of matrices and vectors flying around.

Something like this:

wiod <- list(year = 2005,
             countries  = vector of countries,
             industries = vector of industries,
             final = final demand matrix,
             intermediates = intermediates matrix)

Additionally we could allow the decompr-function to handle these lists then, for a thighter integration. But that is just a minor thing, we can easily keep it the other way.

bquast commented 7 years ago

Hi,

That might be a good idea. I have been thinking about a new format, but have not been able to setting on anything.

I have a few things this morning. Let me take a look in the afternoon.

Bastiaan

bquast commented 7 years ago

In general, I agree with this.

I suggest we make a list of the objects as they are passed to the decomp() function now.

  1. countries
  2. industries
  3. intermediate demand
  4. final demand
  5. output

I think different years can be seperate objects, there is no need to combine them, so greater clarity on that is preferable I think.

zauster commented 7 years ago

I agree that different years should be separate objects, but I would add the year as a one-element vector, so that the name of the object does not need to contain the year. The wiot-object is meant to represent a WIOT of just one year. Then you can have list of wiot-objects and easily operate on those, for example

lapply(wiot.list, decomp)

computes then the decomposition for all years contained in the wiot.list list.

That has the additional side effect that you don't need to name an object according to the respective year, such as wiot2005. Such named objects are always a little tricky to work with. You would have then to use such constructs as

year <- 2005
eval(expr = parse(text = paste0("wiot", year)))

if you want to access a certain element.

zauster commented 7 years ago

That gives me an idea for the data of the 2012 WIOD:

Why not go one step further and put all years 1995 - 2011 into one list, where each year is a wiot-object just as above?

That would make working with the whole data even more convenient. Concerning the size, it will still be around 8 MB, just as the sum of all the separate files...

bquast commented 7 years ago

Ok, I am not sure I follow entirely. In general I am fine with adding a one-element vector that contains the year. However, I am not sure I understand your point about eliminating the need to include it in the name. There is no technical reason to include it right? Because if we are talking about identification then most people will go with the object name anyway, especially as you cannot have several objects with the same name in the .GlobalEnv at the same time, so you need to distinguish between different objects anyway and since the only difference is the year, that seems like a logical one.

Do you mean that if this eliminated, that case the appropriate way would be to add an attribute year = '2005' or something like that.

I dont understand you last post, it seems in opposition to the earlier comments?

Also, I would rather not call them WIOT objects, it should apply equally to e.g. TiVA.

zauster commented 7 years ago

Yeah, now that I read it, I am not sure that I would understand it.

What I meant was: We have 17 similar objects (let's call them IOTs = "Input-Output Tables"). Each of those contains the needed matrices and some additional information (such as a vector with the names of the countries).

Suppose we want to iteratively work with them, then it's cumbersome to work with if they are named "wiot2000" and "wiot2001" and so forth (that was what I was trying to say in my first post). It would be much easier if all those 17 objects were already collected in a list, because then we can easily iterate over the elements in the list (that's the essence of my second post).

I enclose a .zip which contains all files (the 17 separate IOT objects and a list of those 17 objects). I hope from the files it is clear what i mean. I propose to use the "wiod_all.RData" which contains the list. This file has 7.1MB, which is the same as the other 17 file together, so we don't win or lose space.

wiot_2012version.zip

monteforest commented 7 years ago

I like this new setup a lot. How large would the file be with the new 2016 version?

Thanks again for all the hard work you’re putting into this!

On Jan 31, 2017, at 1:15 PM, Oliver Reiter notifications@github.com wrote:

Yeah, now that I read it, I am not sure that I would understand it.

What I meant was: We have 17 similar objects (let's call them IOTs = "Input-Output Tables"). Each of those contains the needed matrices and some additional information (such as a vector with the names of the countries).

Suppose we want to iteratively work with them, then it's cumbersome to work with if they are named "wiot2000" and "wiot2001" and so forth (that was what I was trying to say in my first post). It would be much easier if all those 17 objects were already collected in a list, because then we can easily iterate over the elements in the list (that's the essence of my second post).

I enclose a .zip which contains all files (the 17 separate IOT objects and a list of those 17 objects). I hope from the files it is clear what i mean. I propose to use the "wiod_all.RData" which contains the list. This file has 7.1MB, which is the same as the other 17 file together, so we don't win or lose space.

wiot_2012version.zip https://github.com/bquast/wiod/files/742775/wiot_2012version.zip — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bquast/wiod/issues/2#issuecomment-276445373, or mute the thread https://github.com/notifications/unsubscribe-auth/AH4NMWx1wYrJXJrRlRBok5NnN-YFM1hTks5rX3pbgaJpZM4LW9Lr.

zauster commented 7 years ago

Waaay to large. One year-file has around 40MB now...

bquast commented 7 years ago

I see the value in combining into a list, indeed it can be cumbersome to have to deal with changing object names.

At the same time, I want to be careful with adding complexity to this, despite its code elegance.

As it is, most people find the R interface too complex already, they learn how to use decompr from the youtube videos.

Another consideration here would be that we are dealing with a relatively small number of years, so the need to automate is not that high.

zauster commented 7 years ago

Hmmm, i take your point about complexity. But I think it applies only to the very beginners. And I am not sure if there are a lot of beginners who are already decomposing GVCs.

And I find 17 years already quite a number. With 17 years you should definitely be able to iterate over those years in a easy fashion. Assume you do some transformation of the data that takes ten lines of code. If you can't iterate, you would have to write down 170 lines. That's cumbersome and error prone.

bquast commented 7 years ago

I'm sorry but I don't agree. I receive so many emails every week from people that struggle with the basics.

Like I said, I don't think it makes sense programmatically adding complexity when it is not necessary. I do think most users are not experts and that our target audience should not want to be. This is also why I am working on and RStudio addin that makes it click and play.

I think a better solution would be to show people how they do this by writing a function themselves.

zauster commented 7 years ago

Okay, what's your preferred solution (concerning the format of the data) then?

bquast commented 7 years ago

I think we should go with a list that contains the 5 objects.

We could include a vignette for more advanced users showing how to automate things