Make solution_xls_extract update existing solutions a bit better

ProjectDrawdown / solutions

The mission of Project Drawdown is to help the world reach “Drawdown”— the point in the future when levels of greenhouse gases in the atmosphere stop climbing and start to steadily decline, thereby stopping catastrophic climate change — as quickly, safely, and equitably as possible.

https://www.drawdown.org/

Other

211 stars 92 forks source link

Make solution_xls_extract update existing solutions a bit better #477

Closed johnpaulalex closed 2 years ago

johnpaulalex commented 2 years ago

Don't re-export older ac's than are in the ac/ dir, and don't overwrite init.py (write to an alternate file instead).

Example outputs for commercialglass: Generating new code at /Users/jpalex/dd/solutions/solution/commercialglass/init.py.UPDATED - please merge by hand with init.py ... Skipping scenario PDS1-94p2050-2.75%retrofit rate-40%initial adoption-Plausible, earlier than existing scenarios. Skipping scenario PDS2-97p2050-5.0-%retrofit rate-40%initial adoption-Drawdown, earlier than existing scenarios. Skipping scenario PDS3-99p2050-8.0-%retrofit rate-40%initial adoption-Optimum, earlier than existing scenarios. Skipping scenario PDS1-57p2050-based on a 2.75% retrofit rate, earlier than existing scenarios. Skipping scenario PDS2-59p2050-based on a 5% retrofit rate, earlier than existing scenarios. Skipping scenario PDS3-60p2050-based on an 8% retrofit rate, earlier than existing scenarios. Skipping scenario PDS1-57p2050-based on a 2.75% retrofit rate (integrated), earlier than existing scenarios. Skipping scenario PDS2-59p2050-based on a 5% retrofit rate (Integrated), earlier than existing scenarios. Skipping scenario PDS3-60p2050-based on an 8% retrofit rate (Integrated), earlier than existing scenarios. ...

denised commented 2 years ago

@johnpaulalex I've sent you some more models to work with. I haven't tried it much, but it threw an error with the Insulation model.

johnpaulalex commented 2 years ago

When I ran output_solution_python_file() on all the new excel sheets, I got 22 errors but none of them have to do with the updating part - they look like the usual export issues (unexpected values in the excel). So I think this PR is ok.

Maybe you want to make a new github Issue for this batch of model updates and I can append the errors there? But I’m wondering if we need to get tests working for a given solution before doing an update? I just ran all the solution tests on a clean client and it looks like ~ half of them are already failing. For those solutions I'm not sure if we should try to get the tests working first against the existing excel, or just proceed straight to getting the updated excel working.

johnpaulalex commented 2 years ago

Oh - and the new insulation exported fine for me. What error did you get?

denised commented 2 years ago

I think it makes more sense to update the models before investing a lot in to fixing the failures, because in some cases the failures have to do with the poor status of the original Excel model, which has been improved since then. (Also I don't have he original Excel models, except in the expected.zip form, so it is hard to debug them.)

-- Denise Draper @.***

On Sun, Oct 3, 2021, at 10:24 AM, johnpaulalex wrote:

When I ran output_solution_python_file() on all the new excel sheets, I got 22 errors but none of them have to do with the updating part - they look like the usual export issues (unexpected values in the excel). So I think this PR is ok.

Maybe you want to make a new github Issue for this batch of model updates and I can append the errors there? But I’m wondering if we need to get tests working for a given solution before doing an update? I just ran all the solution tests on a clean client and it looks like ~ half of them are already failing. For those solutions I'm not sure if we should try to get the tests working first against the existing excel, or just proceed straight to getting the updated excel working.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ProjectDrawdown/solutions/pull/477#issuecomment-932992358, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAMTVZ5HZBATTDROCINATUFCGUHANCNFSM5FDNKTAQ.

denised commented 2 years ago

This is what it looked like --- I didn't investigate, but now I will:

Generating new code at ..\insulation__init.py.UPDATED - please merge by hand with init__.py

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) C:\Temp/ipykernel_28540/2693765676.py in *----> 1 sxe.output_solution_python_file(outputdir=str(outpath),xl_filename=str(inpath)*)

C:\Working\solutions\tools\solution_xls_extract.py in output_solution_python_file(outputdir, xl_filename) 1746 1747 p.mkdir(parents=False, exist_ok=True) *-> 1748 prev_scenarios, min_creation_date = _scenarios_from_ac_dir(p) 1749 for name, s in scenarios.items()*: 1750 if min_creation_date:

C:\Working\solutions\tools\solution_xls_extract.py in _scenarios_from_ac_dir(ac_path) 1616 1617 if names: -> 1618 return names, min(creation_dates) 1619 *else: 1620 return []*, None

ValueError: min() arg is an empty sequence

-- Denise Draper @.***

On Sun, Oct 3, 2021, at 10:24 AM, johnpaulalex wrote:

Oh - and the new insulation exported fine for me. What error did you get?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ProjectDrawdown/solutions/pull/477#issuecomment-932992427, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAMTUZ7QM3JE7JM7CEKYLUFCGVTANCNFSM5FDNKTAQ.

denised commented 2 years ago

Totally my bad ... it was looking in the wrong directory. :-(

-- Denise Draper @.***

On Sun, Oct 3, 2021, at 10:51 AM, Denise Draper wrote:

This is what it looked like --- I didn't investigate, but now I will:

Generating new code at ..\insulation__init.py.UPDATED - please merge by hand with init__.py

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) C:\Temp/ipykernel_28540/2693765676.py in *----> 1 sxe.output_solution_python_file(outputdir=str(outpath),xl_filename=str(inpath)*)

C:\Working\solutions\tools\solution_xls_extract.py in output_solution_python_file(outputdir, xl_filename) 1746 1747 p.mkdir(parents=False, exist_ok=True) *-> 1748 prev_scenarios, min_creation_date = _scenarios_from_ac_dir(p) 1749 for name, s in scenarios.items()*: 1750 if min_creation_date:

C:\Working\solutions\tools\solution_xls_extract.py in _scenarios_from_ac_dir(ac_path) 1616 1617 if names: -> 1618 return names, min(creation_dates) 1619 *else: 1620 return []*, None

ValueError: min() arg is an empty sequence

-- Denise Draper @.***

On Sun, Oct 3, 2021, at 10:24 AM, johnpaulalex wrote:

Oh - and the new insulation exported fine for me. What error did you get?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ProjectDrawdown/solutions/pull/477#issuecomment-932992427, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAMTUZ7QM3JE7JM7CEKYLUFCGVTANCNFSM5FDNKTAQ.

denised commented 2 years ago

OK, so now that I ran it correctly and it worked, some thoughts:

We do need to update the new json files (the ones in the ad, tam, pds_ca and/or ref_ca directories.

...And that leads me to the updated init file. On the one hand it would be great to update an init file to the new format (which moves all that directory data out of the .py file into those json files), on the other hand, there are quite a few land models that I wasn't able to do that conversion on (see nutrientmanagement for an example) --- and in these cases I think it is essential that we do have the update init, because some of those calcs might have changed. (I'm not entirely clear, though, how much of those models were auto-generated, vs. generated by hand.)

...In any case, keeping the updated init in the "old form", the way you have it now, is probably the safest thing to do for now, because it will give us the best state for merging some of the more complex / hand modified models. And the simpler models probably also won't need much merging.

So good work! Add in the updated json files and we're probably ready to do this thing!

-- Denise Draper @.***

On Sun, Oct 3, 2021, at 10:47 AM, Denise Draper wrote:

I think it makes more sense to update the models before investing a lot in to fixing the failures, because in some cases the failures have to do with the poor status of the original Excel model, which has been improved since then. (Also I don't have he original Excel models, except in the expected.zip form, so it is hard to debug them.)

-- Denise Draper @.***

On Sun, Oct 3, 2021, at 10:24 AM, johnpaulalex wrote:

When I ran output_solution_python_file() on all the new excel sheets, I got 22 errors but none of them have to do with the updating part - they look like the usual export issues (unexpected values in the excel). So I think this PR is ok.

Maybe you want to make a new github Issue for this batch of model updates and I can append the errors there? But I’m wondering if we need to get tests working for a given solution before doing an update? I just ran all the solution tests on a clean client and it looks like ~ half of them are already failing. For those solutions I'm not sure if we should try to get the tests working first against the existing excel, or just proceed straight to getting the updated excel working.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ProjectDrawdown/solutions/pull/477#issuecomment-932992358, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAMTVZ5HZBATTDROCINATUFCGUHANCNFSM5FDNKTAQ.

johnpaulalex commented 2 years ago

I added in some very simple code to export the *sources.json files (in a second commit. I had actually amended the first commit and apparently that wasn't the right thing to do, but I guess it somehow it's accepting the delta).

I'm guessing they might be too simple. Two concerns below.

I tried it on bioplastic and for both ad/ and tam/, it added one entry with an empty key, e.g. for ad/: "": { "USDA Market Projection": "ad_USDA_Market_Projection.csv", "IfBB 2018": "ad_IfBB_2018.csv" }, Should I just filter those out?
I'm not totally clear how custom adoption gets exported. Looking at the exporting logic, it seems like there's 'ref' adoptions which sometimes say 'include' = false, whereas 'pds' always says 'include' = True. For bioplastic it appears to be called only with a 'pds' prefix, yet its existing ca_pds_sources.json file has a mix of true and false for 'include', which makes me think it's actually a 'ref' file. Also in all the solutions I only see ca_pds_sources.json, no ca_ref_sources.

And I'm guessing I also need to update the codegen to use the new files, right? :)

denised commented 2 years ago

I'll take a look later tonight.

If you can take a look at updating the codegen that would be nice, but orient yourself to a solution like nutrientmanagement --- I don't even know if codegen actually produced that in the first place, or if it was manually hacked. In any case, in any of the more complex situations where init.py is actually computing adoptions instead of just reading them, those need to stay the way they are. If there's a way to decide that automatically, that would be fantastic!

-- Denise Draper @.***

On Tue, Oct 5, 2021, at 4:43 PM, johnpaulalex wrote:

I added in some very simple code to export the *sources.json files (in a second commit. I had actually amended the first commit and apparently that wasn't the right thing to do, but I guess it somehow it's accepting the delta).

I'm guessing they might be too simple. Two concerns below.

I tried it on bioplastic and for both ad/ and tam/, it added one entry with an empty key, e.g. for ad/: "": { "USDA Market Projection": "ad_USDA_Market_Projection.csv", "IfBB 2018": "ad_IfBB_2018.csv" }, Should I just filter those out?

I'm not totally clear how custom adoption gets exported. Looking at the exporting logic, it seems like there's 'ref' adoptions which sometimes say 'include' = false, whereas 'pds' always says 'include' = True. For bioplastic it appears to be called only with a 'pds' prefix, yet its existing ca_pds_sources.json file has a mix of true and false for 'include', which makes me think it's actually a 'ref' file. Also in all the solutions I only see ca_pds_sources.json, no ca_ref_sources. And I'm guessing I also need to update the codegen to use the new files, right? :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ProjectDrawdown/solutions/pull/477#issuecomment-935087260, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAMTVZVA4SXBIN2CUHR7DUFOEQLANCNFSM5FDNKTAQ.

DentonGentry commented 2 years ago

orient yourself to a solution like nutrientmanagement --- I don't even know if codegen actually produced that in the first place, or if it was manually hacked

codegen produced a version of nutrientmanagement which put the numerical values from the Custom PDS adoption tabs into CSV files. The code generator was never smart enough to extract cases where the human researcher had added formulae to procedurally generate the Custom PDS adoption.

All of the code like this was manually added, replacing those sections the code generator had come up with.

        ca_pds_columns = ['Year'] + dd.REGIONS
        adoption_2014 = self.ac.ref_base_adoption['World']
        tla_2050 = self.tla_per_region.loc[2050, 'World']
        area_N_eff_1990_1995 = 225.049083333333
        area_N_eff_1996_2000 = 89.7290333333333
        area_N_eff_2001_2005 = 42.7382
        area_N_eff_2006_2010 = 98.7301833333333

johnpaulalex commented 2 years ago

I didn't think we needed to read the original and worry about custom code - people can merge that in, right - I thought we're just going to generate a brand-new standard init and then it's up to them. I was simply thinking of replacing the codegen loop that writes the various dicts inside the init loop with something like:

    self._ref_tam_sources = scenario.load_sources(THISDIR/'tam'/'tam_ref_sources.json','*')
    self._pds_tam_sources = self._ref_tam_sources  # I imagine this is conditional?
    self.set_tam()

    self._pds_ad_sources = scenario.load_sources(THISDIR/'ad'/'ad_sources.json', '*') 
    self._pds_ca_sources = scenario.load_sources(THISDIR/'ca_pds_data'/'ca_pds_sources.json', 'filename') 
    self.initialize_adoption_bases()

...but there's some other code being written out in the existing write_tam/write_ad/write_custom_ad methods, beyond defining those source dicts....e.g. if I look at write_ad() -- it also defines a big adconfig_list variable, and this:

adconfig = pd.DataFrame(adconfig_list[1:], columns=adconfig_list[0]).set_index('param')

and this:

f.write("        self.ad = adoptiondata.AdoptionData(ac=self.ac, data_sources=ad_data_sources,\n")
regional = convert_bool(xls(a, 'B30')) and convert_bool(xls(a, 'B31'))
if regional:
    f.write("            main_includes_regional=True,\n")
f.write("            adconfig=adconfig)\n")

...is it all superseded by the boilerplate above?

denised commented 2 years ago

I tried it on bioplastic and for both ad/ and tam/, it added one entry with an empty key, e.g. for ad/: "": { "USDA Market Projection": "ad_USDA_Market_Projection.csv", "IfBB 2018": "ad_IfBB_2018.csv" }, Should I just filter those out?

I took a look at the bioplastics spreadsheet, and those sources should be listed under the Baseline and Ambitious cases respectively. Not sure why it isn't doing that, but it is a bug.

I'm not totally clear how custom adoption gets exported. Looking at the exporting logic, it seems like there's 'ref' adoptions which sometimes say 'include' = false, whereas 'pds' always says 'include' = True. For bioplastic it appears to be called only with a 'pds' prefix, yet its existing ca_pds_sources.json file has a mix of true and false for 'include', which makes me think it's actually a 'ref' file. Also in all the solutions I only see ca_pds_sources.json, no ca_ref_sources.

Individual solutions may or may not have either custom ref or custom pds adoption scenarios --- any combination is possible. Bioplastics has a Custom REF Adoption tab, but it is empty, so there are no custom ref adoptions in this case. It also has a Custom PDS Adoption tab, and there are a number of adoptions there.

You can drop the "include in scenario" flag entirely: this is a PDS Adoption specific feature that actually can be altered per-scenario, so the settings written in the json files are overridden anyway. (It didn't use to be that way; that's a functionality change I made in the last few months). I suspect I will have to change a line of code or two somewhere to handle the flag being missing, but really, it should not be there.

denised commented 2 years ago

I didn't think we needed to read the original and worry about custom code - people can merge that in, right - I thought we're just going to generate a brand-new standard init and then it's up to them. I was simply thinking of replacing the codegen loop that writes the various dicts inside the init loop with something like: ...

Yes, this is the right approach.

...but there's some other code being written out in the existing write_tam/write_ad/write_custom_ad methods, beyond defining those source dicts....e.g. if I look at write_ad() -- it also defines a big adconfig_list variable, and this:

... ...is it all superseded by the boilerplate above?

Yup!

The one thing that is finessed over, though, is that sometimes there were variations in the code that was output, that get turned into customizations that still need to be accounted for. Automatically detecting and generating those variations is the hard work I didn't want to get you sucked into. The thing is, though, that if you aren't detecting those variations, then generating the code doesn't really do anything useful. Take a look at improvedcookstoves for an example of overrides, where the generated output would have been just slightly different from the standard boilerplate. The logic for detecting this case doesn't exist in the generator code right now, because it just always output the full data structure.

I'll let you think about whether you want to dig into this or not. (My vote would be "not" --- I'd much rather get the updated data files.)

denised commented 2 years ago

The thing is, though, that if you aren't detecting those variations, then generating the code doesn't really do anything useful.

Actually that's an overstatement --- the new code does have the updated VMA table, for example. And eventually we will probably be using this code to generate new solutions, so being able to do that will be a good thing.

johnpaulalex commented 2 years ago

About ca_pds vs ca_ref: my mistake, I was looking in the ca_pds_data/ directory for ca_ref files, my mistake. I see them now in ca_ref_data/.

Empty-key entries: I'll filter them out (and make a log of which solutions have them) for now.

Code gen: I'm kind of inclined to punt altogether. It sounds like there's no reliable structure in the existing inits. For brand-new solutions, is there some reference for what that should be? e.g. I recently did commercialglass and that appears to have the VMAs still in a dict in the init file. If not, maybe you could pick an existing solution to massage into shape and then I could make the codegen match that one.

denised commented 2 years ago

The VMAs still are in a dict in the init file --- though that is something that may change soon.

One reason I didn't want to worry about updating the generation of init right now is precisely because there's a lot of code churn, and keeping the generator up to date would be challenging. I'd rather wait until the changes settle out, and then be in a better position to decide what the right approach is.

Also, I hope we don't have to optimize code design for doing the extraction --- I would rather aim for the goal of making the python system the system of choice, and not have to worry about updating from Excel at all...

-- Denise Draper @.***

On Fri, Oct 8, 2021, at 11:39 AM, johnpaulalex wrote:

About ca_pds vs ca_ref: my mistake, I was looking in the ca_pds_data/ directory for ca_ref files, my mistake. I see them now in ca_ref_data/.

Empty-key entries: I'll filter them out (and make a log of which solutions have them) for now.

Code gen: I'm kind of inclined to punt altogether. It sounds like there's no reliable structure in the existing inits. For brand-new solutions, is there some reference for what that should be? e.g. I recently did commercialglass and that appears to have the VMAs still in a dict in the init file. If not, maybe you could pick an existing solution to massage into shape and then I could make the codegen match that one.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ProjectDrawdown/solutions/pull/477#issuecomment-939042644, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAMTWPJIA64HU254SX6E3UF43GDANCNFSM5FDNKTAQ.

johnpaulalex commented 2 years ago

Got it. But I'm not sure how to proceed: should I try to update the codegen (to match what solution? even commercialglass has some overrides in HelperTable construction and the initial year being 2018 not 2014) or leave it be?

On Fri, Oct 8, 2021 at 3:10 PM Denise Draper @.***> wrote:

The VMAs still are in a dict in the init file --- though that is something that may change soon.

One reason I didn't want to worry about updating the generation of init right now is precisely because there's a lot of code churn, and keeping the generator up to date would be challenging. I'd rather wait until the changes settle out, and then be in a better position to decide what the right approach is.

Also, I hope we don't have to optimize code design for doing the extraction --- I would rather aim for the goal of making the python system the system of choice, and not have to worry about updating from Excel at all...

-- Denise Draper @.***

On Fri, Oct 8, 2021, at 11:39 AM, johnpaulalex wrote:

About ca_pds vs ca_ref: my mistake, I was looking in the ca_pds_data/ directory for ca_ref files, my mistake. I see them now in ca_ref_data/.

Empty-key entries: I'll filter them out (and make a log of which solutions have them) for now.

Code gen: I'm kind of inclined to punt altogether. It sounds like there's no reliable structure in the existing inits. For brand-new solutions, is there some reference for what that should be? e.g. I recently did commercialglass and that appears to have the VMAs still in a dict in the init file. If not, maybe you could pick an existing solution to massage into shape and then I could make the codegen match that one.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/ProjectDrawdown/solutions/pull/477#issuecomment-939042644>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAKAMTWPJIA64HU254SX6E3UF43GDANCNFSM5FDNKTAQ .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ProjectDrawdown/solutions/pull/477#issuecomment-939058706, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUZJY6Q4KGCRQXO6PZEQQATUF46YRANCNFSM5FDNKTAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

denised commented 2 years ago

I would create an option to just generate the new data files (including the json files) --- and come to think of it, just go ahead and make a json directory for the VMAs as well (make it function just like the other json directories), and we'll assume that we can switch to that sooner rather than later.

Leave the init.py out of the equation for now --- we'll come back to that later.

BTW, there is one new required feature for VMAs and one optional one that I would like to have:

If you look in a VMA table, you will see a box labeled "Correction" or "Low Correction" in column X. This is not currently being extracted, but it should be, and it should be assigned to a boolean field called bound_correction.
I've also added a field called "description" to the VMA structure as well. If we could populate that with the data from the big explanation box, that would be nice. I was going to write these up as separate issues, but as you are in the code right there anyway, perhaps you can add them now.

-- Denise Draper @.***

On Fri, Oct 8, 2021, at 12:54 PM, johnpaulalex wrote:

Got it. But I'm not sure how to proceed: should I try to update the codegen (to match what solution? even commercialglass has some overrides in HelperTable construction and the initial year being 2018 not 2014) or leave it be?

On Fri, Oct 8, 2021 at 3:10 PM Denise Draper @.***> wrote:

The VMAs still are in a dict in the init file --- though that is something that may change soon.

One reason I didn't want to worry about updating the generation of init right now is precisely because there's a lot of code churn, and keeping the generator up to date would be challenging. I'd rather wait until the changes settle out, and then be in a better position to decide what the right approach is.

Also, I hope we don't have to optimize code design for doing the extraction --- I would rather aim for the goal of making the python system the system of choice, and not have to worry about updating from Excel at all...

-- Denise Draper @.***

On Fri, Oct 8, 2021, at 11:39 AM, johnpaulalex wrote:

About ca_pds vs ca_ref: my mistake, I was looking in the ca_pds_data/ directory for ca_ref files, my mistake. I see them now in ca_ref_data/.

Empty-key entries: I'll filter them out (and make a log of which solutions have them) for now.

Code gen: I'm kind of inclined to punt altogether. It sounds like there's no reliable structure in the existing inits. For brand-new solutions, is there some reference for what that should be? e.g. I recently did commercialglass and that appears to have the VMAs still in a dict in the init file. If not, maybe you could pick an existing solution to massage into shape and then I could make the codegen match that one.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/ProjectDrawdown/solutions/pull/477#issuecomment-939042644>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAKAMTWPJIA64HU254SX6E3UF43GDANCNFSM5FDNKTAQ .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ProjectDrawdown/solutions/pull/477#issuecomment-939058706, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUZJY6Q4KGCRQXO6PZEQQATUF46YRANCNFSM5FDNKTAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ProjectDrawdown/solutions/pull/477#issuecomment-939081378, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAMTQ7Q34KBEK2PZVF5IDUF5D6TANCNFSM5FDNKTAQ.

denised commented 2 years ago

Empty-key entries: I'll filter them out (and make a log of which solutions have them) for now.

Reviewing the errors of past imports I came across this in at least one other case. The fix there was to manually move the entries from the empty category to where they belonged. So I think its preferable not to filter them out, both because the error is easier to see, and the manual fix are easier to do that way.

johnpaulalex commented 2 years ago

I'm exporting vma_sources.json now - but note that there's only 1 of 3 code paths where that happens - please confirm.

Still allowing empty keys. It's easier to write to __init__.py.UPDATED than nothing because there's all these f.write() calls. I could write to a tempdir but then...I'd need a tempdir. So I left it for now, but lmk if there's something else you'd like me to do.

I added extraction of 'Low Correction' which works for commercialglass - should I also check for 'Correction' (wasn't sure if you were unsure or listing all variations). Also note that I don't write it out anywhere - it's now available in extract_vmas, as a return value of VMAReader.read_xls. I wasn't sure which description you wanted - I didn't see any big one in the VMA tab - left that as-is for now.

denised commented 2 years ago

I'm exporting vma_sources.json now - but note that there's only 1 of 3 code paths where that happens - please confirm.

I remember the calling pattern for VMAReader is a bit confusing --- as long as it is what is coming out of solution_xls_extract.py, that is fine for now.

Still allowing empty keys. It's easier to write to init.py.UPDATED than nothing because there's all these f.write() calls. I could write to a tempdir but then...I'd need a tempdir. So I left it for now, but lmk if there's something else you'd like me to do.

I think that is fine.

I added extraction of 'Low Correction' which works for commercialglass - should I also check for 'Correction' (wasn't sure if you were unsure or listing all variations). Also note that I don't write it out anywhere - it's now available in extract_vmas, as a return value of VMAReader.read_xls.

I thought I had seen both names, but now that I look it seems to be consistently named 'Low Correction'. But I'm glad you asked because further investigation showed me that it isn't always in column X: in some models it is in column Y. It is always two columns to the right of the SD numbers.

BTW, there's a tool that I use to do this kind of investigation that makes it much easier and more reliable: tools/multi_excel_sample.py will grab the same square of cells in all the spreadsheets and return that in a single spreadsheet. It can grab the values, or the formulas or both. This makes it easy to see what the variations across the models are. I attached the spreadsheet I generated in this case --- and in this case you can see it isn't perfect because there is too much variation in the row numbers, so I didn't get them all, but random sampling a few of the missing ones convinces me that this is a feature that all models have.

vma_low_bound.xlsx

As for writing out: yes, please, can you add it to the json file you are generating. Description too.

I wasn't sure which description you wanted - I didn't see any big one in the VMA tab - left that as-is for now.

Look at cell D126 in Geothermal for an example.

johnpaulalex commented 2 years ago

I added one more commit. Now vma_sources.json has a dict per VMA - filename, bound_correction, and description fields. I included the non-filename VMAs and I'm not sure what I'm seeing. Here's a sample from commercialglass:

// This is as expected. "Current Adoption": { "filename": "Current_Adoption.csv", "bound_correction": false, "description": "Little data outside of the EU and USA exist for the percentage of building stock with high-performance static (solution technology) and single-pane windows (conventional technology). An adoption for the current year was estimated." }, // This one has a filename yet somehow extracted a description of null. That's not easy because we initialize its value to // the empty string. afaict it means that "sheet.cell(r, tools.util.co("D")).value" can return null? "CONVENTIONAL First Cost per Implementation Unit": { "filename": "CONVENTIONAL_First_Cost_per_Implementation_Unit.csv", "bound_correction": false, "description": null }, // Here's one with no filename (there are several).
"CONVENTIONAL Average Annual Use": { "bound_correction": false, "description": "" },

denised commented 2 years ago

Lets delete the VMAs that have no filename --- they aren't real. I'm switching the advanced controls code around to only use VMAs that exist, as we speak, so this should come together nicely.

-- Denise Draper @.***

On Mon, Oct 11, 2021, at 4:29 PM, johnpaulalex wrote:

I added one more commit. Now vma_sources.json has a dict per VMA - filename, bound_correction, and description fields. I included the non-filename VMAs and I'm not sure what I'm seeing. Here's a sample from commercialglass:

// This is as expected. "Current Adoption": { "filename": "Current_Adoption.csv", "bound_correction": false, "description": "Little data outside of the EU and USA exist for the percentage of building stock with high-performance static (solution technology) and single-pane windows (conventional technology). An adoption for the current year was estimated." }, // This one has a filename yet somehow extracted a description of null. That's not easy because we initialize its value to // the empty string. afaict it means that "sheet.cell(r, tools.util.co("D")).value" can return null? "CONVENTIONAL First Cost per Implementation Unit": { "filename": "CONVENTIONAL_First_Cost_per_Implementation_Unit.csv", "bound_correction": false, "description": null }, // Here's one with no filename (there are several). "CONVENTIONAL Average Annual Use": { "bound_correction": false, "description": "" },

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ProjectDrawdown/solutions/pull/477#issuecomment-940516666, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAMTUWUIEOAUVGNE7MLXTUGNXPFANCNFSM5FDNKTAQ.

johnpaulalex commented 2 years ago

Done. I also turned the 'null' descriptions back to empty strings, at least they're all the same type (strings) now.

denised commented 2 years ago

Awesome, thank you!

-- Denise Draper @.***

On Wed, Oct 13, 2021, at 4:38 AM, johnpaulalex wrote:

Done. I also turned the 'null' descriptions back to empty strings, at least they're all the same type (strings) now.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ProjectDrawdown/solutions/pull/477#issuecomment-942211220, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAMTWXHAQB57OVVMAFNLLUGVVR3ANCNFSM5FDNKTAQ.