USEPA / flowsa

Library that attributes resource use, waste, emissions, and loss to economic sectors
MIT License
24 stars 19 forks source link

revise generateflowbysector() to enable access to already attributed FBA sources #344

Closed catherinebirney closed 1 year ago

catherinebirney commented 1 year ago

Enable ability in FBS methods to access already attributed FBA source data.

Necessary for Land FBS method for urban activity set, need to access the already attributed data CBECS and MECS

catherinebirney commented 1 year ago

@matthewlchambers Do you have any comments on these changes? Or a better method for accessing the already attributed source data?

matthewlchambers commented 1 year ago

I think a better way to access the already attributed data would just be to add the dataset that will need to be accessed to sources_to_cache. Then it can be accessed from config['sources_to_cache']['...'].

catherinebirney commented 1 year ago

That is a better idea, thanks!

catherinebirney commented 1 year ago

@matthewlchambers generateFlowBySector() still requires revision because the MECS and CBECS data are used as primary sources, not attribution sources

matthewlchambers commented 1 year ago

@matthewlchambers generateFlowBySector() still requires revision because the MECS and CBECS data are used as primary sources, not attribution sources

Is there a reason they can't just be listed both under sources_to_cache and as a primary sources (possibly using yaml anchors/aliases to reduce the amount of duplicate text? A concern I have is that with the code as written in the latest commit for this PR, if a source is cached, it's then impossible to use the same source differently (e.g. with different selection fields, etc.) as a primary source. Basically, the cache is overriding anything else with the same name.

catherinebirney commented 1 year ago

If the sources are listed in sources_to_cache and as primary sources then the code is run twice, which I was trying to avoid. But I do understand your point of the cache overriding all other calls to the same data source, which is more problematic than code being run twice, especially as in our current version of flowsa, we consistently run the same code multiple times in an FBS method. So - I agree, I'll revert those commits and use anchors in the land FBS

catherinebirney commented 1 year ago

Closing/not merging this PR because opted to not change flowby.py. Instead updated the Land FBS yaml in recursive-refac branch.