architecture-building-systems / CityEnergyAnalyst

The City Energy Analyst (CEA)
https://www.cityenergyanalyst.com/
MIT License
194 stars 61 forks source link

Document input and output variables for all scripts #1069

Closed daren-thomas closed 4 years ago

daren-thomas commented 6 years ago

This is something Jack could do. For each of the scripts in the CEA, document all input and output variables (e.g. columns in the files - .shp, .dbx, .cvs).

Standardize nomenclature of physical quantities (use of Q, T etc. standard naming of subscripts)

Jack-Hawthorne commented 5 years ago

hey @daren-thomas can the input locator tracer be used to automate this? i remember discussing with you and seeing some really nice flow diagrams you were working on.

daren-thomas commented 5 years ago

@Jack-Hawthorne sure. come to my office to learn how to do this.

daren-thomas commented 5 years ago

@Jack-Hawthorne the file bin\trace-inputlocator.bat has examples of how to use the trace-inputlocator tool.

daren-thomas commented 5 years ago

example:

C:\Users\darthoma\Documents\GitHub\CityEnergyAnalyst (master)
λ cea-config data-helper
City Energy Analyst version 2.9.0
Configuring `cea data-helper` with the following parameters:
- general:scenario = c:\reference-case-open\baseline
- general:region = CH
- data-helper:archetypes = ['comfort', 'architecture', 'HVAC', 'internal-loads', 'supply', 'restrictions']

C:\Users\darthoma\Documents\GitHub\CityEnergyAnalyst (master)
λ cea --help trace-inputlocator

Trace the InputLocator calls in a selection of scripts.

OPTIONS for trace-inputlocator:
--scenario: c:\reference-case-open\baseline
    Select the path to the scenario to run
--scripts: ['data-helper', 'demand', 'emissions']
    sequential list of scripts to run
--graphviz-output-file: c:\reference-case-open\baseline/outputs/trace_inputlocator.output.gv
    Path to the filename of the GraphViz output file
--yaml-output-file: c:\reference-case-open\baseline/outputs/trace_inputlocator.output.yml
    Path to the filename of the YAML output file

C:\Users\darthoma\Documents\GitHub\CityEnergyAnalyst (master)
λ cea trace-inputlocator --scripts data-helper
City Energy Analyst version 2.9.0
Running `cea trace-inputlocator` with the following parameters:
- general:scenario = c:\reference-case-open\baseline
- trace-inputlocator:scripts = ['data-helper']
- trace-inputlocator:graphviz-output-file = c:\reference-case-open\baseline/outputs/trace_inputlocator.output.gv
- trace-inputlocator:yaml-output-file = c:\reference-case-open\baseline/outputs/trace_inputlocator.output.yml
City Energy Analyst version 2.9.0
Running `cea data-helper` with the following parameters:
- general:scenario = c:\reference-case-open\baseline
- general:region = CH
- data-helper:archetypes = ['comfort', 'architecture', 'HVAC', 'internal-loads', 'supply', 'restrictions']
c:\users\darthoma\appdata\local\conda\conda\envs\cea\lib\site-packages\pysal\__init__.py:65: VisibleDeprecationWarning: PySAL's API will be changed on 2018-12-31. The last release made with this API is version 1.14.4. A preview of the next API version is provided in the `pysal` 2.0 prelease candidate. The API changes and a guide on how to change imports is provided at https://pysal.org/about
  ), VisibleDeprecationWarning)
Running data-helper with scenario = c:\reference-case-open\baseline
Running data-helper with archetypes = ['comfort', 'architecture', 'HVAC', 'internal-loads', 'supply', 'restrictions']
c:\users\darthoma\documents\github\cityenergyanalyst\cea\datamanagement\data_helper.py:164: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  names_df[field] = 0
get_building_restrictions, c:\reference-case-open\baseline\inputs\building-properties\restrictions.dbf
get_building_hvac, c:\reference-case-open\baseline\inputs\building-properties\technical_systems.dbf get_building_comfort, c:\reference-case-open\baseline\inputs\building-properties\indoor_comfort.dbf get_building_internal, c:\reference-case-open\baseline\inputs\building-properties\internal_loads.dbf

get_building_occupancy, c:\reference-case-open\baseline\inputs\building-properties\occupancy.dbf
get_building_supply, c:\reference-case-open\baseline\inputs\building-properties\supply_systems.dbf
get_archetypes_properties, c:\reference-case-open\baseline\databases\CH\archetypes\construction_properties.xlsx
get_building_age, c:\reference-case-open\baseline\inputs\building-properties\age.dbf
get_archetypes_schedules, c:\reference-case-open\baseline\databases\CH\archetypes\occupancy_schedules.xlsx
get_building_architecture, c:\reference-case-open\baseline\inputs\building-properties\architecture.dbf
digraph trace_inputlocator {
    rankdir="LR";
    node [shape=box];
    "data-helper"[style=filled, fillcolor=darkorange];
    "data-helper" -> "inputs/building-properties/indoor_comfort.dbf";
    "inputs/building-properties/occupancy.dbf" -> "data-helper";
    "data-helper" -> "inputs/building-properties/internal_loads.dbf";
    "data-helper" -> "inputs/building-properties/supply_systems.dbf";
    "data-helper" -> "inputs/building-properties/architecture.dbf";
    "data-helper" -> "inputs/building-properties/technical_systems.dbf";
    "inputs/building-properties/age.dbf" -> "data-helper";
    "databases/CH/archetypes/occupancy_schedules.xlsx" -> "data-helper";
    "databases/CH/archetypes/construction_properties.xlsx" -> "data-helper";
    "data-helper" -> "inputs/building-properties/restrictions.dbf";
}
Execution time: 70.16s
daren-thomas commented 5 years ago

Meeting on January 11 to discuss update on this issue.

Jack-Hawthorne commented 5 years ago

@daren-thomas when trying to run cea trace-inputlocator --scripts demand i get the following error at the end

Traceback (most recent call last):
  File "C:\Users\Jack\Miniconda2\envs\cea\Scripts\cea-script.py", line 11, in <module>
    load_entry_point('cityenergyanalyst', 'console_scripts', 'cea')()
  File "c:\users\jack\documents\github\cityenergyanalyst\cea\interfaces\cli\cli.py", line 65, in main
    script_module.main(config)
  File "c:\users\jack\documents\github\cityenergyanalyst\cea\tests\trace_inputlocator.py", line 73, in main
    create_yaml_output(trace_data, config.trace_inputlocator.yaml_output_file)
  File "c:\users\jack\documents\github\cityenergyanalyst\cea\tests\trace_inputlocator.py", line 101, in create_yaml_output
    with open(yaml_output_file, 'r') as f:
IOError: [Errno 13] Permission denied: 'c:\\reference-case-open\\baseline'

I was running anaconda prompt as administrator, so permissions shouldn't be an issue. It's probably a minor issue, I still get an output file in the reference case.

Jack-Hawthorne commented 5 years ago

currently working on this on branch 1069-document-input-output-variables

daren-thomas commented 5 years ago

@Jack-Hawthorne I'm moving this issue back to "In Development" - as I see it, the first stage (list files, determine if input or output) is mainly done, awaiting the last few scripts. Would you mind listing the missing ones please?

The next stage is actually listing the structure of these files:

I would like a machine-readable format to store this meta data. How about a yaml file or something? Why machine-readable? Because then we can create an input/output checker that checks these files before a script is run - not just for existence, but also for integrity. It will also help a lot when we move to another representation of this data (e.g. SQL) for cloud based computing.

Jack-Hawthorne commented 5 years ago

@daren-thomas I'm not sure if you want all scripts done - some are not fully developed.

For sake of time please let me know which are the most important (and functional) of the missing:

Currently trying to create a generalised method for reading all files within the trace_data. It's going ok so far, however, I'm quite sure how to organise the data for the different data-structures, say a shape or csv vs a json or yaml. Should i just throw in some nulls for conformity?

Some other considerations:

daren-thomas commented 5 years ago

@Jack-Hawthorne oh dear, all the scripts you mention above are exactly those scripts i'm most desperate to have inputs / outputs clearly defined. so yes, i'd really like them all done. but i can also help you run them if you like? how can i best help?

your generalized method sounds exactly like what i was hoping for. i suggest you have separate methods for each of the different file types and start doing them one by one - i'd love to see some in-progress examples. let's meet up and discuss your current work, ok?

first, i would like a machine readable version of this data. this could be a json file listing the relevant meta-data. since we're probably going to be editing this information by hand (for descriptions) maybe a yaml format would suit better? the next step is to have this machine readable knowledge base of the status quo of the files which can then be transformed into other outputs, like replacing the data in the glossary.

when looking at your graphs, i think they could eventually be linked to the data descriptions. wouldn't that be awesome?

Jack-Hawthorne commented 5 years ago

@daren-thomas check out branch 1069-db-metadata and run trace-inputlocator. Note: you may have to replace your current config

The result is a json containing locator methods as keys and the details for the files they reference. Could be an idea to also include the list/array lengths for each variable -> easy way to check for consistency (no Null/Nan of course).

Next steps are:

Let me know your feedback.

daren-thomas commented 5 years ago

@Jack-Hawthorne could you send me such a json file? maybe create a gist or something?

Jack-Hawthorne commented 5 years ago

data-helper.txt

this is converted to txt but should be helpful

daren-thomas commented 5 years ago

@Jack-Hawthorne thank you for this. I do have some comments. Let's use YAML syntax to talk about this as it is less verbose... (this is just for the discussion. but, writing the yaml file would be just as easy as writing the json file)

Your data format seems to be (i have to do some guessing here):

locator_method_name:
  - locator_method_docstring
  - actual_file_retrieved
  - file_type  # here assuming dbf
  - Sheet1:
    column_name_1:
      - sample_value
      - [type_1, type_2, ..., type_n]
    column_name_2:
      - sample_value
      - [type_1, type_2, ..., type_n]

The excel format is similar, but actually has real worksheets.

Some improvements I think are necessary:

An example from the file you sent could look like this:

get_building_occupancy:
  file-path: C:\reference-case-open\baseline\inputs\building-properties\occupancy.dbf
  file-type: dbf
  schema:
    - name: Name
      sample_value: B01
      types_found: [str]
   - name: SCHOOL
      sample_value: 0.0
      types_found: [float]
   - ...

I think this makes the data format more self-descriptive.

Jack-Hawthorne commented 5 years ago

@daren-thomas thanks for the feedback, that shouldn't be too hard.

one thing though, the reason i made the 'fake' sheet is to be able to easily iterate for all variables ( if they are on the same level, no need for conditionals). if it doesn't matter that much though, it's no problem to do as you've said.

apart from that, do you think it would be advantageous to have the array length, script dependencies or other information before i start with connecting the naming.csv?

Jack-Hawthorne commented 5 years ago

demand.txt this is a sample for demand

Jack-Hawthorne commented 5 years ago

@daren-thomas ok i've added a script dependencies method now which updates each time you run trace. you can see which script created the file and which scripts use it. also changed the file type to yml as requested.

sample below: trace_dependencies_variables.txt

get_building_architecture:
    created_by: [data-helper]
    file_path: C:\reference-case-open\baseline\inputs\building-properties\architecture.dbf
    file_type: dbf
    schema:
        !!python/unicode 'Es':
            sample_value: 0.9
            types_found: [float, str]
        !!python/unicode 'Hs':
            sample_value: 0.45
            types_found: [float, str]
        !!python/unicode 'Name':
            sample_value: !!python/unicode 'B09'
            types_found: [str]
        !!python/unicode 'Ns':
            sample_value: 0.45
            types_found: [float, str]
        !!python/unicode 'type_cons':
            sample_value: !!python/unicode 'T3'
            types_found: [float, str]
        !!python/unicode 'type_leak':
            sample_value: !!python/unicode 'T2'
            types_found: [str]
        !!python/unicode 'type_roof':
            sample_value: !!python/unicode 'T4'
            types_found: [float, str]
        !!python/unicode 'type_shade':
            sample_value: !!python/unicode 'T1'
            types_found: [str]
        !!python/unicode 'type_wall':
            sample_value: !!python/unicode 'T5'
            types_found: [float, str]
        !!python/unicode 'type_win':
            sample_value: !!python/unicode 'T2'
            types_found: [str]
        !!python/unicode 'void_deck':
            sample_value: 0
            types_found: [int, float, str]
        !!python/unicode 'wwr_east':
            sample_value: 0.4
            types_found: [float, str]
        !!python/unicode 'wwr_north':
            sample_value: 0.4
            types_found: [float, str]
        !!python/unicode 'wwr_south':
            sample_value: 0.4
            types_found: [float, str]
        !!python/unicode 'wwr_west':
            sample_value: 0.4
            types_found: [float, str]
    used_by: [demand, radiation-daysim]
daren-thomas commented 5 years ago

@JIMENOFONSECA I'm not sure this is done yet.

Jack-Hawthorne commented 5 years ago

@daren-thomas still having troubles with decentralized. see #1825 also found an issue in #1841 which is halting progress also.

daren-thomas commented 5 years ago

@Jack-Hawthorne what is the current status?

daren-thomas commented 5 years ago

(status update from @Jack-Hawthorne: there is meta-data from the trace-inputlocator script in the branch 1069-db-meta)

Jack-Hawthorne commented 5 years ago

status update - currently running decentralized script for only one building as the run time is slowing down progress. @daren-thomas is running optimization with a similar setup

the trace yml currently contains metadata from the following scripts:

i added thermal-network viz graph to the script-input-outputs.rst which should contain all of the scripts above

the next scripts to run are as follows:

hopefully the lead time shouldn't be too bad for the rest of the scripts.

Jack-Hawthorne commented 5 years ago

TODO after the all the metadata is recorded:

@daren-thomas @JIMENOFONSECA thoughts on this?

daren-thomas commented 5 years ago

@Jack-Hawthorne I suggest you:

I'm promoting this issue to an epic and will add the new issues to that epic.

Thank you :)

Jack-Hawthorne commented 5 years ago

@daren-thomas are we getting rid of the naming.csv in plots? after the glossary.csv is fleshed out and accurate? how is the schema.yml looking? any eta on this one?

daren-thomas commented 5 years ago
jimenofonseca commented 5 years ago

Bear in mind that we use naming.csv for all the plots naming.

Or did his change? If so, what is the new way? We also need a reference to the colors. And units.. On 23 Jul 2019, 15:41 +0800, Jack-Hawthorne notifications@github.com, wrote:

@daren-thomashttps://github.com/daren-thomas are we getting rid of the naming.csv in plots? after the glossary.csv is fleshed out and accurate?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/architecture-building-systems/CityEnergyAnalyst/issues/1069?email_source=notifications&email_token=ACEOXAUST3PZQLQRK4CBKGLQA2YYVA5CNFSM4ER4CPZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2SG5AQ#issuecomment-514092674, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACEOXATHX7YJ3RKBD6GLWALQA2YYVANCNFSM4ER4CPZA.

daren-thomas commented 5 years ago

@JIMENOFONSECA yes, I know. But glossary.csv contains the same information - and more! So it seems sensible to me to only have one such file to reference / maintain. I think cea.plots.variable_naming just needs to change the path to the file it uses and should just work. We'd have to test that though.

jimenofonseca commented 5 years ago

cool, let’s do it! On 29 Jul 2019, 7:28 PM +0800, Daren Thomas notifications@github.com, wrote:

@JIMENOFONSECAhttps://github.com/JIMENOFONSECA yes, I know. But glossary.csv contains the same information - and more! So it seems sensible to me to only have one such file to reference / maintain. I think cea.plots.variable_naming just needs to change the path to the file it uses and should just work. We'd have to test that though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/architecture-building-systems/CityEnergyAnalyst/issues/1069?email_source=notifications&email_token=ACEOXAQU6BN445XFRLSYNV3QB3H3NA5CNFSM4ER4CPZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3ANG5A#issuecomment-515953524, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACEOXATIJ7L52TILHO64O7LQB3H3NANCNFSM4ER4CPZA.

jimenofonseca commented 4 years ago

so the order is clear, we will implement further changes by merging naming.csv with the glossary.csv in #2200