Implement input/ouput groups in data model

henhuy commented 4 months ago

Now input and output groups can be defined as lists in output results. See tests/_files/industry_example.csv for example upload file

bereba commented 4 months ago

I tested the new functionality with another input file. It's nice but I think writing multiple group tags into one group leads to only one group consisting of both tags. Which can be problematic if as only those are grouped which always have the same two tags. See on the uploaded screenshot for "primary/fossil" and "primary" coming from the imports. Therefore, I think for the sankey there should be only one grouping using the current functionality, or am I missing something?

bereba commented 4 months ago

Another aspect to consider: at the moment all rows have to tagged with groups if they shell be plotted in "group mode". I think for our application, it might be better to be able to group only selected processes/commodities. A pragmatic way to do so could be to first set e.g. input_group = input_commodity and afterward overwrite input_group with the rows have entries?

See change from upper screenshot to this one (apart from harmonizing group tags as described in previous comment):

henhuy commented 4 months ago

I tested the new functionality with another input file. It's nice but I think writing multiple group tags into one group leads to only one group consisting of both tags. Which can be problematic if as only those are grouped which always have the same two tags. See on the uploaded screenshot for "primary/fossil" and "primary" coming from the imports. Therefore, I think for the sankey there should be only one grouping using the current functionality, or am I missing something?

You have to filter input/ouput groups first - so that you either have only primary tag or both... It's a feature, not a bug ;)

henhuy commented 4 months ago

Another aspect to consider: at the moment all rows have to tagged with groups if they shell be plotted in "group mode". I think for our application, it might be better to be able to group only selected processes/commodities. A pragmatic way to do so could be to first set e.g. input_group = input_commodity and afterward overwrite input_group with the rows have entries?

See change from upper screenshot to this one (apart from harmonizing group tags as described in previous comment):

Maybe we should only use process,input and output groups instead of process/input_commodities/output_commodities - then we can get rid of those second columns. But there is no possibility to group some commodities and left others untouched - but this is the case for both approaches

bereba commented 4 months ago

I tested the new functionality with another input file. It's nice but I think writing multiple group tags into one group leads to only one group consisting of both tags. Which can be problematic if as only those are grouped which always have the same two tags. See on the uploaded screenshot for "primary/fossil" and "primary" coming from the imports. Therefore, I think for the sankey there should be only one grouping using the current functionality, or am I missing something?

You have to filter input/ouput groups first - so that you either have only primary tag or both... It's a feature, not a bug ;)

Ah okay I see :) It requires no adjustments, then, but for our analysis, I'll just start with one group tag per column for the sankeys to make the usage easier. I think multiple group tags normally might be more relevant for distinct plots.

bereba commented 4 months ago

Another aspect to consider: at the moment all rows have to tagged with groups if they shell be plotted in "group mode". I think for our application, it might be better to be able to group only selected processes/commodities. A pragmatic way to do so could be to first set e.g. input_group = input_commodity and afterward overwrite inputgroup with the rows have entries? See change from upper screenshot to this one (apart from harmonizing group tags as described in previous comment): ![image](https://private-user-images.githubusercontent.com/15122097/337275604-72f15cd5-df8c-4e5b-831a-142aae616840.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTc2ODExODUsIm5iZiI6MTcxNzY4MDg4NSwicGF0aCI6Ii8xNTEyMjA5Ny8zMzcyNzU2MDQtNzJmMTVjZDUtZGY4Yy00ZTViLTgzMWEtMTQyYWFlNjE2ODQwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MDYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjA2VDEzMzQ0NVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBjNDk3MTBiMzdjMDZlZmJlYjhkMDJhN2RjZmM5NzcxNzhmY2YwZTljYmI2MGUxNDc0ZWRkOGE3ZGM4ZTQ1ZDgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.WGvH-HArYGrUi8esmmnqVpS7emliKgi4sY4A3EtNOI)

Maybe we should only use process,input and output groups instead of process/input_commodities/output_commodities - then we can get rid of those second columns. But there is no possibility to group some commodities and left others untouched - but this is the case for both approaches

I don't understand the first suggestion. Then we would loose some information, wouldn't we?

Couldn't the approach I suggested be implemented in the output data adapter? Something like "if process_group.value is empty copy process.value".And then plot process_groups. I think then some groups would be applied and others untouched?

henhuy commented 4 months ago

Another aspect to consider: at the moment all rows have to tagged with groups if they shell be plotted in "group mode". I think for our application, it might be better to be able to group only selected processes/commodities. A pragmatic way to do so could be to first set e.g. input_group = input_commodity and afterward overwrite inputgroup with the rows have entries? See change from upper screenshot to this one (apart from harmonizing group tags as described in previous comment): ![image](https://private-user-images.githubusercontent.com/15122097/337275604-72f15cd5-df8c-4e5b-831a-142aae616840.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTc2ODExODUsIm5iZiI6MTcxNzY4MDg4NSwicGF0aCI6Ii8xNTEyMjA5Ny8zMzcyNzU2MDQtNzJmMTVjZDUtZGY4Yy00ZTViLTgzMWEtMTQyYWFlNjE2ODQwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MDYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjA2VDEzMzQ0NVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBjNDk3MTBiMzdjMDZlZmJlYjhkMDJhN2RjZmM5NzcxNzhmY2YwZTljYmI2MGUxNDc0ZWRkOGE3ZGM4ZTQ1ZDgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.WGvH-HArYGrUi8esmmnqVpS7emliKgi4sY4A3EtNOI)

Maybe we should only use process,input and output groups instead of process/input_commodities/output_commodities - then we can get rid of those second columns. But there is no possibility to group some commodities and left others untouched - but this is the case for both approaches

I don't understand the first suggestion. Then we would loose some information, wouldn't we?

Couldn't the approach I suggested be implemented in the output data adapter? Something like "if process_group.value is empty copy process.value".And then plot process_groups. I think then some groups would be applied and others untouched?

I thought of input_group holding input_comodity as first tag, like: input_groups = ["coal", "primary", "fossile"] Then in sankey, you could decide which detail level you want to use. Column input_commodity could be deleted then (same procedure for other columns)

OpenEnergyPlatform / django-comparison-dashboard

Implement input/ouput groups in data model #17