Respect our electricity calibration when creating the baseline

babakkhavari commented 2 years ago

Lets assume we say 2% of the population in rural areas cook with electricity and at the same time we say that the electrification rate in rural areas is 10%.

Should we then make sure that the 2% that cook with electricity in our baseline is amongst the 10% that our tool assume as electrified?

camiloramirezgo commented 2 years ago

This is also happening for Biogas: when we do not have available biogas (i.e. no livestock, no water or temperature too low) we are getting inf values in the time_of_collection. We will need to exclude biogas from the baseline for those cells and adjust the shares of the others.

babakkhavari commented 2 years ago

Problem here is: We will be changing the shares of the different fuels to not match the original data. What has been done now wont work if you call them after eachother? Lets say you adjust electricity in certain cells using the first function and then in the second function you adjust biogas in certain cells. How do make sure that the cells do not coincide (unelectrified cell without biogas potential)? And if we can not make sure that the cells do not coincide the second function will always render the first function void?

If I am right, we can solve this fairly easily by adding three small functions on top of what @aliciaoberholzer already has done:

One function ensuring that the shares of all fuels are equal to 1. Alicia does this in her tech_share function already, but after fixing the shares for electricity. We should move this first in my opinion.
After readjusting the values based on function 1, we need to make sure that the electricity access rates in urban and rural settlements are still higher than cooking shares of electricity. I do not see this being a problem ever, but lets just make sure its not. In cases where electrical cooking is higher than the electricity rate replace with biomass (or whatever technology has the highest share already.
Same thing as function 2, but for biogas. Does total population living in areas suitable for biogas add up to the share calibrated in function 1? If not, replace with biomass (or whatever technology has the highest share)

Related to function 2 and 3, you probably want to produce a warning to the user "Total population that can cooking with biogas based on GIS data is lower than what you have entered in the tech specs. The new share is X%. If not OK, please adjust the GIS data." or similar

The function 1-3 are national and not a cell basis. Then comes the stuff Alicia has already done (I do however think that they need to be done together at the same time).

babakkhavari commented 1 year ago

Instead of adjusting everything to biomass, adjust to a weighted average of the dirty ones :)

camiloramirezgo commented 1 year ago

This excel implements a basic example and solution to this problem, it is the basis of the methods that we are using in the package functions:

Fuel shares calibration.xlsx

aliciaoberholzer commented 1 year ago

For rural areas (we assume that there is no biogas in urban areas):

We allocate population to electricity using the ratio in each cell (see Excel)
We allocate the remaining population to biogas where there is potential in each cell (see Excel)
We allocate the remaining population proportionally to technologies other than electricity and biogas in each cell (see Excel)
We then allocated excess population cooking with technologies other than biogas and electricity to biogas (see Excel). However, the population cooking with a given technology at the national level ends up larger than the calculated sum of people cooking with that technology, causing the cell-wise shares to become negative for technologies other than electricity and biogas. In other words, there is no excess population that needs to be moved but we are moving people any way because we have no check in place to see whether there is an excess population to move to biogas. These methods work for the Excel example but when we implement them for the Nepal data, we get negative numbers. We haven't been able to replicate the issue in Excel. I think the issue is that we are adjusting cells that don't need to be adjusted and/or actually require a higher number of people (aka moving people away from biogas and elec TO these technologies instead of away from).

aliciaoberholzer commented 1 year ago

We are likely doing something incorrectly in the step where we allocate remaining population to technologies other than electricity and biogas:

remaining_share = 0
for name, tech in tech_dict.items():
    if (name != "Biogas") & (name != "Electricity"):
        remaining_share += tech.current_share_rural
remaining_pop = self.gdf["Calibrated_pop"] - (tech_dict["Biogas"].pop_sqkm + tech_dict["Electricity"].pop_sqkm)
for name, tech in tech_dict.items():
    if (name != "Biogas") & (name != "Electricity"):
        tech.pop_sqkm = remaining_pop * tech.current_share_rural / remaining_share

babakkhavari commented 1 year ago

I do not think this alone will solve the issue, but one of the issues we have is the calibration of urban and rural population I think. I did some tests. And the rural population does not seem to add up?

Which in turn creates a mismatch of 3 million between:

nepal.techs["Collected_Traditional_Biomass"].pop_sqkm.loc[~isurban].sum() + nepal.techs["LPG"].pop_sqkm.loc[~isurban].sum() + nepal.techs["Biogas"].pop_sqkm.loc[~isurban].sum() and the sum of population_cooking_rural

babakkhavari commented 1 year ago

So, updating the urban and population calibration in onstove.py seems to fix the issues in rural cells. The new population calibration now looks like:

    def calibrate_current_pop(self):

        isurban = self.gdf["IsUrban"] > 20
        total_rural_pop = self.gdf.loc[~isurban, "Pop"].sum()
        total_urban_pop = self.gdf["Pop"].sum() - total_rural_pop

        calibration_factor_u = (self.specs["Population_start_year"] * self.specs["Urban_start"])/total_urban_pop
        calibration_factor_r = (self.specs["Population_start_year"] * (1-self.specs["Urban_start"]))/total_rural_pop

        self.gdf["Calibrated_pop"] = 0
        self.gdf["Calibrated_pop"].loc[~isurban] = self.gdf["Pop"] * calibration_factor_r
        self.gdf["Calibrated_pop"].loc[isurban] = self.gdf["Pop"] * calibration_factor_u

and the urban rural calibration looks like:

    def calibrate_urban_current_and_future_GHS(self, GHS_path):
        self.raster_to_dataframe(GHS_path, name="IsUrban", method='sample')

        self.calibrate_current_pop()

We still have problems with urban areas though. They all have values of NaN I think the reason is the last for-loop:

        for name, tech in tech_dict.items():
             if (name != "Biogas") & (name != "Electricity"):
                 tech.pop_sqkm.loc[isurban] = remaining_urbpop * tech.current_share_urban / remaining_urbshare
             tech.pop_sqkm = tech.pop_sqkm / self.gdf["Calibrated_pop"]

.loc does not update series values in place it brings them out only I think. So, the line after the if creates a series of ~3k lines (the same number of lines as urban settlements), but leaves the original series untouched. So the last line then divides the rural areas correctly and for urban areas it only divides NaN-values by the calibrated pop.

babakkhavari commented 1 year ago

This last issue can be fixed with small updates in the techshare_allocation:

    def techshare_allocation(self, tech_dict):
        """
        Calculates the baseline population cooking with each technology in each urban and rural square kilometer.
        The function takes a stepwise approach to allocating population to each cooking technology:

        1. Allocates the population cooking with electricity in each cell based upon the population with access
        to electricity. 
        2. Allocates the population cooking with biogas in each rural cell based upon whether or not there is
        biogas potential. 
        3. Allocates the remaining population proprotionally to other cooking technologies in rural & urban cells.

        The number of people cooking with each technology in each urban and rural square km is added as an attribute to 
        each technology class.

        Parameters
        ---------
        tech_dict: Dictionary
        The dictionary of technology classses

        The function uses the dictionary of technology classes, including biogas collection time, and main GeoDataFrame to do this.
        """
        #allocate population in each urban cell to electricity
        isurban = self.gdf["IsUrban"] > 20
        urban_factor = tech_dict["Electricity"].population_cooking_urban / sum(isurban * self.gdf["Elec_pop_calib"])
        tech_dict["Electricity"].pop_sqkm = (isurban) * (self.gdf["Elec_pop_calib"] * urban_factor)
        #allocate population in each rural cell to electricity 
        rural_factor = tech_dict["Electricity"].population_cooking_rural / sum(~isurban * self.gdf["Elec_pop_calib"])
        tech_dict["Electricity"].pop_sqkm.loc[~isurban] = (self.gdf["Elec_pop_calib"] * rural_factor)
        #create series for biogas same size as dataframe with zeros 
        tech_dict["Biogas"].pop_sqkm = pd.Series(np.zeros(self.gdf.shape[0]))
        #allocate remaining population to biogas in rural areas where there's potential
        biogas_factor = tech_dict["Biogas"].population_cooking_rural / (self.gdf["Calibrated_pop"].loc[(tech_dict["Biogas"].time_of_collection!=float('inf')) & ~isurban].sum())
        tech_dict["Biogas"].pop_sqkm.loc[(~isurban) & (tech_dict["Biogas"].time_of_collection!=float('inf'))] = self.gdf["Calibrated_pop"] * biogas_factor
        pop_diff = (tech_dict["Biogas"].pop_sqkm + tech_dict["Electricity"].pop_sqkm) > self.gdf["Calibrated_pop"]
        tech_dict["Biogas"].pop_sqkm.loc[pop_diff] = self.gdf["Calibrated_pop"] - tech_dict["Electricity"].pop_sqkm
        #allocate remaining population proportionally to techs other than biogas and electricity 
        remaining_share = 0
        for name, tech in tech_dict.items():
            if (name != "Biogas") & (name != "Electricity"):
                remaining_share += tech.current_share_rural
        remaining_pop = self.gdf.loc[~isurban, "Calibrated_pop"] - (tech_dict["Biogas"].pop_sqkm.loc[~isurban] + tech_dict["Electricity"].pop_sqkm.loc[~isurban])
        for name, tech in tech_dict.items():
            if (name != "Biogas") & (name != "Electricity"):
                tech.pop_sqkm = pd.Series(np.zeros(self.gdf.shape[0]))       #ADDED THIS
                tech.pop_sqkm.loc[~isurban] = remaining_pop * tech.current_share_rural / remaining_share
        #move excess population cooking with technologies other than electricity and biogas to biogas 
        adjust_cells = np.ones(self.gdf.shape[0], dtype=int)
        for name, tech in tech_dict.items():
            if name != "Electricity":
                adjust_cells &= (tech.pop_sqkm > 0)
        for name, tech in tech_dict.items():
            if (name != "Electricity") & (name != "Biogas"):
                tech_remainingpop = sum(tech.pop_sqkm.loc[~isurban]) - tech.population_cooking_rural
                tech.tech_remainingpop = tech_remainingpop
                remove_pop = sum(tech.pop_sqkm.loc[(~isurban) & (adjust_cells)])
                share_allocate = tech_remainingpop/ remove_pop
                self.share_allocate = share_allocate
                tech_dict["Biogas"].pop_sqkm.loc[(~isurban) & (adjust_cells)] += tech.pop_sqkm * share_allocate
                tech.pop_sqkm.loc[(~isurban) & (adjust_cells)] *= (1 - share_allocate) #what does this line do, this confuses me.
        #allocate urban population to technologies
        for name, tech in tech_dict.items():
             if (name != "Biogas") & (name != "Electricity"):
                 tech.pop_sqkm.loc[isurban] = 0.0
        remaining_urbshare = 0.0
        for name, tech in tech_dict.items():
             if (name != "Biogas") & (name != "Electricity"):
                 remaining_urbshare += tech.current_share_urban
        remaining_urbpop = self.gdf.loc[isurban, "Calibrated_pop"] - tech_dict["Electricity"].pop_sqkm.loc[isurban]
        for name, tech in tech_dict.items():
             if (name != "Biogas") & (name != "Electricity"):
                 tech.pop_sqkm.loc[isurban] = remaining_urbpop * tech.current_share_urban / remaining_urbshare
             tech.pop_sqkm = tech.pop_sqkm / self.gdf["Calibrated_pop"]

And one line in set_base_fuel:

base_fuel.total_time_yr += (tech.total_time_yr * tech.pop_sqkm).fillna(0)

Open-Source-Spatial-Clean-Cooking-Tool / OnStove

Respect our electricity calibration when creating the baseline #207