Open jonathandmoyer opened 4 months ago
Hi Jon, @jonathandmoyer
Regarding this specific data issue, I just want to make sure- the graph you are looking at is based on Energy Demand and GDP PPP, right? GDP PPP has a good coverage, so I don't think that's the problem. I believe energy demand in IFs is calculated by Production + Import - Export. These tables are from IEA. I picked two conventional energy types (coal and crude oil) and went to the lastest database to check for their coverage. For China, I do see data values from 1971 and onward for these tables (EnProdOilIEA & EnProdCoalIEA). For import and export, EnImOilIEA & EnExOilIEA for example, I also see values in the 70s. Unless energy demand is initialized through other data tables in IFs, we should have values for China all the way back to the 70s (same for Russia). This is probably not a data issue. But I could be wrong.
@jonathandmoyer and on your questions regarding the data pulling process, my quick answer is that we have some general rules that I will describe below but situations vary by data source/data table. Mistakes and issues happen but we try our best to document things to prevent that from happening again.
During the data vetting process, RAs check for several things 1) Countries that do not have values at all. This might be caused by country concordance or a change made by the data source. This is fairly easy to resolve.
2) Historical years that used to have data values in the old table but not anymore in the new table (like the WDI situation you described in the ticket). If no specific reasons are provided by the data source, RAs then check the consistency between values from the old table and values from the new table. If values are generally consistent across years, then values from the old table are merged into the new table. If not, we normally overwrite the old table. An exception is when the data source changes the methodology. Then we preserve the old table (normally by adding Old as the suffix in the table name) and import the new table. Note that we did not do this for the FAO food production table even though they changed their methodology in 2013.
3) Big year to year jump or big discrepancies in the same country-year between old and new data. This one is even harder to solve because some might just be data errors (percentages exceeding 100%) or some countries just have bad data. Our general rule is to take whatever are provided by the data source unless a developer is highly against it and want to have some values manually put in. For detecting impacts on the model, we used to put the data into IFs and rebuild the base to see changes. But I only have two people in the team now and simply don't have that capacity anymore.
As for the IHME upate, I failed to communicate the changes we made to the death cause mapping with the modeling pod, that's my fault. For tables used by historical analog, I was not aware of those tables and will update them. Thanks!
Howdy:
Before you update them, I don't think we have an agreement on what should be included in the mental health category. I think the last email was from me to you about neurological disorders.
Hopefully Jose can look at the energy intensity question because those values had shown up previously.
Jonathan D Moyer, PhD Associate Professor Director Frederick S. Pardee Institute for International Futures Josef Korbel School of International Studies University of Denver
On Fri, Jul 5, 2024 at 12:55 PM Yutang @.***> wrote:
@jonathandmoyer https://github.com/jonathandmoyer and on your questions regarding the data pulling process, my quick answer is that we have some general rules that I will describe below but situations vary by data source/data table. Mistakes and issues happen but we try our best to document things to prevent that from happening again.
During the data vetting process, RAs check for several things
1.
Countries that do not have values at all. This might be caused by country concordance or a change made by the data source. This is fairly easy to resolve. 2.
Historical years that used to have data values in the old table but not anymore in the new table (like the WDI situation you described in the ticket). If no specific reasons are provided by the data source, RAs then check the consistency between values from the old table and values from the new table. If values are generally consistent across years, then values from the old table are merged into the new table. If not, we normally overwrite the old table. An exception is when the data source changes the methodology. Then we preserve the old table (normally by adding Old as the suffix in the table name) and import the new table. Note that we did not do this for the FAO food production table even though they changed their methodology in 2013. 3.
Big year to year jump or big discrepancies in the same country-year between old and new data. This one is even harder to solve because some might just be data errors (percentages exceeding 100%) or some countries just have bad data. Our general rule is to take whatever are provided by the data source unless a developer is highly against it and want to have some values manually put in. For detecting impacts on the model, we used to put the data into IFs and rebuild the base to see changes. But I only have two people in the team now and simply don't have that capacity anymore.
As for the IHME upate, I failed to communicate the changes we made to the death cause mapping with the modeling pod, that's my fault. For tables used by historical analog, I was not aware of those tables and will update them. Thanks!
— Reply to this email directly, view it on GitHub https://github.com/PardeeCenterDU/IFs-Issues-Tracking/issues/325#issuecomment-2211291310, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUNOSOQPY4YUIPVSG5PDC5LZK3T3TAVCNFSM6AAAAABKNQTC6SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJRGI4TCMZRGA . You are receiving this because you were mentioned.Message ID: @.***>
On the energy intensity, it's a large formula, and one element can null out the entire year for a specific country. In the case of China that seems to be EnProdBiodieselIEA, which starts its data in 2005, so everything before that is Null because Biodiesel is null. This is the formula for ENDEM:
I'm not sure when is it that we had data for this, but at least in the web version (8.06) we have the same situation where data starts to show in 2005.
On the energy intensity, it's a large formula, and one element can null out the entire year for a specific country. In the case of China that seems to be EnProdBiodieselIEA, which starts its data in 2005, so everything before that is Null because Biodiesel is null. This is the formula for ENDEM:
I'm not sure when is it that we had data for this, but at least in the web version (8.06) we have the same situation where data starts to show in 2005.
Thanks José! Do you think we can update the function to treat nulls as 0s in this energy demand formula?
This is how it looks using nulls as 0s for that formula, firtst ENDEM:
second using flex displays as in the first image:
This is how it looks using nulls as 0s for that formula, firtst ENDEM:
second using flex displays as in the first image:
I'm not seeing imgs.
Having trouble uploading them, still trying.
Now you should see them, for some reason it wouldn't allow me to upload them while I was on VPN.
Jonathan, could you please confirm you're ok with this change so that I make it permanent for the next installation.
Who is the energy intensity blue dot in 2020 at the top
That is Turkmenistan at 7% of GDP.
OK--I guess the only way to resolve some of these issues (the transients) is to make a broad estimate to fill in nulls? Right? it looks like perhaps the soviet union in the blue line for energy demand in the 80s? Is that actual energy being removed or a groupings issue (more countries in the soviet union than russia)
If you mean the option that we have in the menu to fill holes, that only works for groups, which is not the case of the graph we're showing here. If you mean replacing nulls with estimations in the data, I guess that would do it, but we probably don't want to do that.
Yes, that's the Soviet Union/Russia in the blue line, I guess that the question about energy being removed or groupings is a data question for Yutang.
Hi @PardeeCenterIFs @jonathandmoyer The table below shows the total energy production for all USSR countries (the columm on the right represents the sum of 15 countries). As you can see, from 1989 to 1990, the grouping issue happened- total energy production from USSR was distributed to 15 countries during the dissolution while the sum remained almost same. Then Russia's production went down in the following years as well.
Following the formula of energy demand, I then checked energy trade. As you can expect, Russia's net energy export had a surge from 1989 to 1990, and gradually went down.
Are you saying you just made this change? Or something else?
Are you saying you just made this change? Or something else?
Is this for me or José? I was just checking the data for the Russia druing 1985-1995 in case you want to know what are underlining data behind the blue line. I did not make any change to the model or data.
I see I totally misread this.
I'm still on the fence about including this because of the complexity of reconciling these different series with nulls and geographic transitions (Soviet to Russia). But I think if it's used appropriately, it's beneficial. Perhaps we just include it and revisit the complicated bits at a later stage.
I see I totally misread this.
I'm still on the fence about including this because of the complexity of reconciling these different series with nulls and geographic transitions (Soviet to Russia). But I think if it's used appropriately, it's beneficial. Perhaps we just include it and revisit the complicated bits at a later stage.
Understood. One thing I can think of is that, if included, we need to make sure countries that have no values at all should yield nulls instead of 0s. But of course, there could be other complications (hopefully not for this formula....)
That's currently not possible, it's either nulls as 0s, or nulls that null out the entire formula for that year, I don't have an easy way to check that all members of the formula are nulls before doing the change.
@solox1 Hi José, is this implemented yet? During today's meeting with Jonathan, he mentioned this to me.
I think another potential solution is to bring in a total energy production data table so that the aggregation is done outside IFs. When that table has null values for some countries, we can safely say IEA does not provide data for those countries. What do you think?
So 3 routes-
Let me know what you think!
2. Would be very difficult to do, my preference would be to do #3, if #1 is not enough for Jonathan.
Just want to make sure, I don't think #1 is implemented in IFs, right? I'm using 8.30, and the oldest data point I can see is in 1993.
As for #3, there is a table in IFs called, SeriesEnProdTOTALIEA. But it includes more energy types than what IFs is currently reading from.
Option 1 is implemented in general, but it's a configuration issue, you would have to go to ifsVar.db, and open table IFSVARHISTANAFORMULAS, look for the row for ENDEM, the last field is: UseNullsAsZeros, change it from 0 to 1, and every time you do history plus forecast for ENDEM you'll be using this formula where replacing Nulls with 0s.
Option 1 is implemented in general, but it's a configuration issue, you would have to go to ifsVar.db, and open table IFSVARHISTANAFORMULAS, look for the row for ENDEM, the last field is: UseNullsAsZeros, change it from 0 to 1, and every time you do history plus forecast for ENDEM you'll be using this formula where replacing Nulls with 0s.
Thanks José! @jonathandmoyer I'm just putting this instruction on Jonathan's radar.
Currently, #3 seems to be the best solution. But as I said, the total energy production table I brought in includes more energy types than what we currently have in IFs.
@solox1 Hi José, I tried the solution. It seems the code is not doing what it is designed to. Afghanistan has no values across all the tables (production, import, export) but is still showing 0s after that summation and subtraction.
ENDEM
Just to re-emphasize the issue here. During one of the meetings, we discovered the following code with a comment that says We treat nulls as zeros ONLY when doing addition or subtraction between a null and a NON-null value. So, by design, if we change UseNullsAsZeros to 1 for ENDEM, a country with all nulls should still give "Null" as the result, and null values for countries with partial nulls will be converted to 0s.
I tried it with Afghnistan and it seems the code is not working as desgined. Am I missing something?
@solox1 Hi José, I tried the solution. It seems the code is not doing what it is designed to. Afghanistan has no values across all the tables (production, import, export) but is still showing 0s after that summation and subtraction.
If there are no values across all tables, that's null, if you turn on the option, those will be converted to 0, as I said before, it's either nulls to 0, or nulls that null the entire formula.
I see the issue now, you're right, this should return Null, , the problem was in this line: If (Not IsDBNull(alj)) Or (Not IsDBNull(arj)) Then This was not enough, some times values are empty, instead of Null, I think that's the case when there's not even a column with the particular year, so anyway I had to change it to: If (Not IsDBNull(alj) And Not IsNothing(alj)) Or (Not IsDBNull(arj) And Not IsNothing(arj)) Then
This change will be included in version 8.34.
Is this enough to close the issue?
Hi Yutang: I have a question about the process for updating historical data in light of the mental health data and also a series that looks like some historical data went missing. Energy intensity--two variables in the plot below--used to have data that went back into the 80s. You could see, for example, how soviet and chinese production were really energy inefficient. Do you know where those data might have gone? Can you do some digging to see if there's a previous version that had the data and then somehow it was erased? I'm sure it was erased because WDI doesn't keep it in their database because they think the data are poor. That's fine, but I don't think we should simply erasing data if WDI does because we might want it for a different purpose. So the question is: when we update data and there are changes to the historical record from the source, what do you do? And in light of the mental health question: when you see big changes to a data series that's important for the model, what do you do?