Document input and output variables for all scripts

daren-thomas commented 6 years ago

This is something Jack could do. For each of the scripts in the CEA, document all input and output variables (e.g. columns in the files - .shp, .dbx, .cvs).

Standardize nomenclature of physical quantities (use of Q, T etc. standard naming of subscripts)

Jack-Hawthorne commented 5 years ago

hey @daren-thomas can the input locator tracer be used to automate this? i remember discussing with you and seeing some really nice flow diagrams you were working on.

daren-thomas commented 5 years ago

@Jack-Hawthorne sure. come to my office to learn how to do this.

daren-thomas commented 5 years ago

@Jack-Hawthorne the file bin\trace-inputlocator.bat has examples of how to use the trace-inputlocator tool.

daren-thomas commented 5 years ago

example:

C:\Users\darthoma\Documents\GitHub\CityEnergyAnalyst (master)
λ cea-config data-helper
City Energy Analyst version 2.9.0
Configuring `cea data-helper` with the following parameters:
- general:scenario = c:\reference-case-open\baseline
- general:region = CH
- data-helper:archetypes = ['comfort', 'architecture', 'HVAC', 'internal-loads', 'supply', 'restrictions']

C:\Users\darthoma\Documents\GitHub\CityEnergyAnalyst (master)
λ cea --help trace-inputlocator

Trace the InputLocator calls in a selection of scripts.

OPTIONS for trace-inputlocator:
--scenario: c:\reference-case-open\baseline
    Select the path to the scenario to run
--scripts: ['data-helper', 'demand', 'emissions']
    sequential list of scripts to run
--graphviz-output-file: c:\reference-case-open\baseline/outputs/trace_inputlocator.output.gv
    Path to the filename of the GraphViz output file
--yaml-output-file: c:\reference-case-open\baseline/outputs/trace_inputlocator.output.yml
    Path to the filename of the YAML output file

C:\Users\darthoma\Documents\GitHub\CityEnergyAnalyst (master)
λ cea trace-inputlocator --scripts data-helper
City Energy Analyst version 2.9.0
Running `cea trace-inputlocator` with the following parameters:
- general:scenario = c:\reference-case-open\baseline
- trace-inputlocator:scripts = ['data-helper']
- trace-inputlocator:graphviz-output-file = c:\reference-case-open\baseline/outputs/trace_inputlocator.output.gv
- trace-inputlocator:yaml-output-file = c:\reference-case-open\baseline/outputs/trace_inputlocator.output.yml
City Energy Analyst version 2.9.0
Running `cea data-helper` with the following parameters:
- general:scenario = c:\reference-case-open\baseline
- general:region = CH
- data-helper:archetypes = ['comfort', 'architecture', 'HVAC', 'internal-loads', 'supply', 'restrictions']
c:\users\darthoma\appdata\local\conda\conda\envs\cea\lib\site-packages\pysal\__init__.py:65: VisibleDeprecationWarning: PySAL's API will be changed on 2018-12-31. The last release made with this API is version 1.14.4. A preview of the next API version is provided in the `pysal` 2.0 prelease candidate. The API changes and a guide on how to change imports is provided at https://pysal.org/about
  ), VisibleDeprecationWarning)
Running data-helper with scenario = c:\reference-case-open\baseline
Running data-helper with archetypes = ['comfort', 'architecture', 'HVAC', 'internal-loads', 'supply', 'restrictions']
c:\users\darthoma\documents\github\cityenergyanalyst\cea\datamanagement\data_helper.py:164: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  names_df[field] = 0
get_building_restrictions, c:\reference-case-open\baseline\inputs\building-properties\restrictions.dbf
get_building_hvac, c:\reference-case-open\baseline\inputs\building-properties\technical_systems.dbf get_building_comfort, c:\reference-case-open\baseline\inputs\building-properties\indoor_comfort.dbf get_building_internal, c:\reference-case-open\baseline\inputs\building-properties\internal_loads.dbf

get_building_occupancy, c:\reference-case-open\baseline\inputs\building-properties\occupancy.dbf
get_building_supply, c:\reference-case-open\baseline\inputs\building-properties\supply_systems.dbf
get_archetypes_properties, c:\reference-case-open\baseline\databases\CH\archetypes\construction_properties.xlsx
get_building_age, c:\reference-case-open\baseline\inputs\building-properties\age.dbf
get_archetypes_schedules, c:\reference-case-open\baseline\databases\CH\archetypes\occupancy_schedules.xlsx
get_building_architecture, c:\reference-case-open\baseline\inputs\building-properties\architecture.dbf
digraph trace_inputlocator {
    rankdir="LR";
    node [shape=box];
    "data-helper"[style=filled, fillcolor=darkorange];
    "data-helper" -> "inputs/building-properties/indoor_comfort.dbf";
    "inputs/building-properties/occupancy.dbf" -> "data-helper";
    "data-helper" -> "inputs/building-properties/internal_loads.dbf";
    "data-helper" -> "inputs/building-properties/supply_systems.dbf";
    "data-helper" -> "inputs/building-properties/architecture.dbf";
    "data-helper" -> "inputs/building-properties/technical_systems.dbf";
    "inputs/building-properties/age.dbf" -> "data-helper";
    "databases/CH/archetypes/occupancy_schedules.xlsx" -> "data-helper";
    "databases/CH/archetypes/construction_properties.xlsx" -> "data-helper";
    "data-helper" -> "inputs/building-properties/restrictions.dbf";
}
Execution time: 70.16s

daren-thomas commented 5 years ago

Meeting on January 11 to discuss update on this issue.

Jack-Hawthorne commented 5 years ago

@daren-thomas when trying to run cea trace-inputlocator --scripts demand i get the following error at the end

Traceback (most recent call last):
  File "C:\Users\Jack\Miniconda2\envs\cea\Scripts\cea-script.py", line 11, in <module>
    load_entry_point('cityenergyanalyst', 'console_scripts', 'cea')()
  File "c:\users\jack\documents\github\cityenergyanalyst\cea\interfaces\cli\cli.py", line 65, in main
    script_module.main(config)
  File "c:\users\jack\documents\github\cityenergyanalyst\cea\tests\trace_inputlocator.py", line 73, in main
    create_yaml_output(trace_data, config.trace_inputlocator.yaml_output_file)
  File "c:\users\jack\documents\github\cityenergyanalyst\cea\tests\trace_inputlocator.py", line 101, in create_yaml_output
    with open(yaml_output_file, 'r') as f:
IOError: [Errno 13] Permission denied: 'c:\\reference-case-open\\baseline'

I was running anaconda prompt as administrator, so permissions shouldn't be an issue. It's probably a minor issue, I still get an output file in the reference case.

Jack-Hawthorne commented 5 years ago

currently working on this on branch 1069-document-input-output-variables

daren-thomas commented 5 years ago

@Jack-Hawthorne I'm moving this issue back to "In Development" - as I see it, the first stage (list files, determine if input or output) is mainly done, awaiting the last few scripts. Would you mind listing the missing ones please?

The next stage is actually listing the structure of these files:

dbf files: column names, data types, maybe sample values? and then a description, which will have to be manually added
excel files: sheet names, column names, data types, maybe sample values? and a description, which will have to be manually added
shp files: see dbf files
what else??

I would like a machine-readable format to store this meta data. How about a yaml file or something? Why machine-readable? Because then we can create an input/output checker that checks these files before a script is run - not just for existence, but also for integrity. It will also help a lot when we move to another representation of this data (e.g. SQL) for cloud based computing.

Jack-Hawthorne commented 5 years ago

@daren-thomas I'm not sure if you want all scripts done - some are not fully developed.

For sake of time please let me know which are the most important (and functional) of the missing:

multi-criteria-analysis
operation-costs
optimization
decentralized
supply-system-simulation
sensitivity-demand-samples
sensitivity-demand-simulate
network-layout
thermal-network-optimization (if working)
plots
plots-scenario-comparisons
plots-supply-system
plots-optimization

Currently trying to create a generalised method for reading all files within the trace_data. It's going ok so far, however, I'm quite sure how to organise the data for the different data-structures, say a shape or csv vs a json or yaml. Should i just throw in some nulls for conformity?

Some other considerations:

Would you like this information rendered in the docs or just easily available for later use (the glossary is kind of already doing this but can certainly be improved on)
Does it have to go through the trace_inputlocator or should it be a separate tool (one where you simply point it at the db path and it loops through each one.

daren-thomas commented 5 years ago

@Jack-Hawthorne oh dear, all the scripts you mention above are exactly those scripts i'm most desperate to have inputs / outputs clearly defined. so yes, i'd really like them all done. but i can also help you run them if you like? how can i best help?

your generalized method sounds exactly like what i was hoping for. i suggest you have separate methods for each of the different file types and start doing them one by one - i'd love to see some in-progress examples. let's meet up and discuss your current work, ok?

first, i would like a machine readable version of this data. this could be a json file listing the relevant meta-data. since we're probably going to be editing this information by hand (for descriptions) maybe a yaml format would suit better? the next step is to have this machine readable knowledge base of the status quo of the files which can then be transformed into other outputs, like replacing the data in the glossary.

when looking at your graphs, i think they could eventually be linked to the data descriptions. wouldn't that be awesome?

Jack-Hawthorne commented 5 years ago

@daren-thomas check out branch 1069-db-metadata and run trace-inputlocator. Note: you may have to replace your current config

The result is a json containing locator methods as keys and the details for the files they reference. Could be an idea to also include the list/array lengths for each variable -> easy way to check for consistency (no Null/Nan of course).

Next steps are:

to somehow merge the naming.csv in plots with the glossary.csv in docs keeping the keys the same.
we can then cross check variables in locator_meta with those in naming.csv
get descriptions, units, typical values from naming.csv and add them to the locator_meta
write a script to draw up a rst table based on the information in locator_meta.

Let me know your feedback.

daren-thomas commented 5 years ago

@Jack-Hawthorne could you send me such a json file? maybe create a gist or something?

Jack-Hawthorne commented 5 years ago

data-helper.txt

this is converted to txt but should be helpful

daren-thomas commented 5 years ago

@Jack-Hawthorne thank you for this. I do have some comments. Let's use YAML syntax to talk about this as it is less verbose... (this is just for the discussion. but, writing the yaml file would be just as easy as writing the json file)

Your data format seems to be (i have to do some guessing here):

locator_method_name:
  - locator_method_docstring
  - actual_file_retrieved
  - file_type  # here assuming dbf
  - Sheet1:
    column_name_1:
      - sample_value
      - [type_1, type_2, ..., type_n]
    column_name_2:
      - sample_value
      - [type_1, type_2, ..., type_n]

The excel format is similar, but actually has real worksheets.

Some improvements I think are necessary:

do not invent a worksheet name for formats that don't have a worksheet ("Sheet1" for dbf)
- from python zen: flat is better than nested (try: import this in python shell)
you do not need to include the docstring for the locator methods in this meta data file - that information can be retrieved as necessary any time output is being generated
the second level (value of the locator_method_name key) should be a dictionary, not a list, with keys describing the values: file-path, file-type, schema
the schema contents are dependent on the file type
- for excel type, it's a dictionary of worksheets (extra level)
- at level of worksheet (or top schema level for single-table file formats csv, dbf, shp)
- list of columns (not dictionary, ideally, but in practice it doesn't really matter)
- columns specified by dict (not list) of name, sample_value, types_found

An example from the file you sent could look like this:

get_building_occupancy:
  file-path: C:\reference-case-open\baseline\inputs\building-properties\occupancy.dbf
  file-type: dbf
  schema:
    - name: Name
      sample_value: B01
      types_found: [str]
   - name: SCHOOL
      sample_value: 0.0
      types_found: [float]
   - ...

I think this makes the data format more self-descriptive.

Jack-Hawthorne commented 5 years ago

@daren-thomas thanks for the feedback, that shouldn't be too hard.

one thing though, the reason i made the 'fake' sheet is to be able to easily iterate for all variables ( if they are on the same level, no need for conditionals). if it doesn't matter that much though, it's no problem to do as you've said.

apart from that, do you think it would be advantageous to have the array length, script dependencies or other information before i start with connecting the naming.csv?

Jack-Hawthorne commented 5 years ago

demand.txt this is a sample for demand

Jack-Hawthorne commented 5 years ago

@daren-thomas ok i've added a script dependencies method now which updates each time you run trace. you can see which script created the file and which scripts use it. also changed the file type to yml as requested.

sample below: trace_dependencies_variables.txt

get_building_architecture:
    created_by: [data-helper]
    file_path: C:\reference-case-open\baseline\inputs\building-properties\architecture.dbf
    file_type: dbf
    schema:
        !!python/unicode 'Es':
            sample_value: 0.9
            types_found: [float, str]
        !!python/unicode 'Hs':
            sample_value: 0.45
            types_found: [float, str]
        !!python/unicode 'Name':
            sample_value: !!python/unicode 'B09'
            types_found: [str]
        !!python/unicode 'Ns':
            sample_value: 0.45
            types_found: [float, str]
        !!python/unicode 'type_cons':
            sample_value: !!python/unicode 'T3'
            types_found: [float, str]
        !!python/unicode 'type_leak':
            sample_value: !!python/unicode 'T2'
            types_found: [str]
        !!python/unicode 'type_roof':
            sample_value: !!python/unicode 'T4'
            types_found: [float, str]
        !!python/unicode 'type_shade':
            sample_value: !!python/unicode 'T1'
            types_found: [str]
        !!python/unicode 'type_wall':
            sample_value: !!python/unicode 'T5'
            types_found: [float, str]
        !!python/unicode 'type_win':
            sample_value: !!python/unicode 'T2'
            types_found: [str]
        !!python/unicode 'void_deck':
            sample_value: 0
            types_found: [int, float, str]
        !!python/unicode 'wwr_east':
            sample_value: 0.4
            types_found: [float, str]
        !!python/unicode 'wwr_north':
            sample_value: 0.4
            types_found: [float, str]
        !!python/unicode 'wwr_south':
            sample_value: 0.4
            types_found: [float, str]
        !!python/unicode 'wwr_west':
            sample_value: 0.4
            types_found: [float, str]
    used_by: [demand, radiation-daysim]

daren-thomas commented 5 years ago

@JIMENOFONSECA I'm not sure this is done yet.

Jack-Hawthorne commented 5 years ago

@daren-thomas still having troubles with decentralized. see #1825 also found an issue in #1841 which is halting progress also.

daren-thomas commented 5 years ago

@Jack-Hawthorne what is the current status?

daren-thomas commented 5 years ago

(status update from @Jack-Hawthorne: there is meta-data from the trace-inputlocator script in the branch 1069-db-meta)

Jack-Hawthorne commented 5 years ago

status update - currently running decentralized script for only one building as the run time is slowing down progress. @daren-thomas is running optimization with a similar setup

the trace yml currently contains metadata from the following scripts:

data-helper
radiation-daysim
demand
emissions
operation-costs
network-layout
lake-potential
sewage-potential
photovoltaic
solar-collector (FP and ET)
photovoltaic-thermal (FP and ET)
thermal-network

i added thermal-network viz graph to the script-input-outputs.rst which should contain all of the scripts above

the next scripts to run are as follows:

decentralized
optimization
multi-criteria-analysis
plots
plots-supply-system
plots-optimization

hopefully the lead time shouldn't be too bad for the rest of the scripts.

Jack-Hawthorne commented 5 years ago

TODO after the all the metadata is recorded:

merging of plots.variables.csv declaration with all the attributes found in the written files (script outputs).
post-processing to raise variables which are not listed in the variables declaration but are found in the meta (new/undocumented variables) OR are listed in the variables declaration but not in the meta (potentially old variables).
post-processing to create an up-to-date glossary which links to the viz graphs

@daren-thomas @JIMENOFONSECA thoughts on this?

daren-thomas commented 5 years ago

@Jack-Hawthorne I suggest you:

create separate issues for each of the TODOs mentioned above.
also separate issues for the scripts yet to run.
please create a pull request of your work so far

I'm promoting this issue to an epic and will add the new issues to that epic.

Thank you :)

Jack-Hawthorne commented 5 years ago

@daren-thomas are we getting rid of the naming.csv in plots? after the glossary.csv is fleshed out and accurate? how is the schema.yml looking? any eta on this one?

daren-thomas commented 5 years ago

naming.csv: huh. I guess so. let's create an issue for that. (#2200)
schema.yml: no ETA. Future work will probably be manual, though.

jimenofonseca commented 5 years ago

Bear in mind that we use naming.csv for all the plots naming.

Or did his change? If so, what is the new way? We also need a reference to the colors. And units.. On 23 Jul 2019, 15:41 +0800, Jack-Hawthorne notifications@github.com, wrote:

@daren-thomashttps://github.com/daren-thomas are we getting rid of the naming.csv in plots? after the glossary.csv is fleshed out and accurate?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/architecture-building-systems/CityEnergyAnalyst/issues/1069?email_source=notifications&email_token=ACEOXAUST3PZQLQRK4CBKGLQA2YYVA5CNFSM4ER4CPZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2SG5AQ#issuecomment-514092674, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACEOXATHX7YJ3RKBD6GLWALQA2YYVANCNFSM4ER4CPZA.

daren-thomas commented 5 years ago

@JIMENOFONSECA yes, I know. But glossary.csv contains the same information - and more! So it seems sensible to me to only have one such file to reference / maintain. I think cea.plots.variable_naming just needs to change the path to the file it uses and should just work. We'd have to test that though.

jimenofonseca commented 5 years ago

cool, let’s do it! On 29 Jul 2019, 7:28 PM +0800, Daren Thomas notifications@github.com, wrote:

@JIMENOFONSECAhttps://github.com/JIMENOFONSECA yes, I know. But glossary.csv contains the same information - and more! So it seems sensible to me to only have one such file to reference / maintain. I think cea.plots.variable_naming just needs to change the path to the file it uses and should just work. We'd have to test that though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/architecture-building-systems/CityEnergyAnalyst/issues/1069?email_source=notifications&email_token=ACEOXAQU6BN445XFRLSYNV3QB3H3NA5CNFSM4ER4CPZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3ANG5A#issuecomment-515953524, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACEOXATIJ7L52TILHO64O7LQB3H3NANCNFSM4ER4CPZA.

jimenofonseca commented 4 years ago

so the order is clear, we will implement further changes by merging naming.csv with the glossary.csv in #2200

architecture-building-systems / CityEnergyAnalyst

Document input and output variables for all scripts #1069