WISDEM / WOMBAT

Windfarm Operations & Maintenance cost-Benefit Analysis Tool
https://wisdem.github.io/WOMBAT/
Apache License 2.0
21 stars 11 forks source link

Event log file does not have headers or column names #100

Closed johnwang0576 closed 4 months ago

johnwang0576 commented 1 year ago

I was running the how_to jupyter notebook (see the code that I ran below) and encounter the following errors. I think the issue is that the code expects the first column of the event log file to have the column names. I can resolve this issue by setting the read_option to be ReadOptions(autogenerate_column_names=True), but then another error popped up as the code is expecting the some fixed column names such as date_time.

Code

from time import perf_counter
from wombat import Simulation
from wombat.core.library import load_yaml, DINWOODIE

library_path = DINWOODIE
config = load_yaml(library_path / "project/config", "base.yaml")
sim = Simulation.from_config(config)
sim.env.cleanup_log_files()

start = perf_counter()
sim.run()
end = perf_counter()
timing = end - start
print(f"Run time: {timing / 60:,.2f} minutes")

Error Message

ValueError                                Traceback (most recent call last)
Cell In[2], line 2
      1 start = perf_counter()
----> 2 sim.run()
      3 end = perf_counter()
      4 timing = end - start

File ~/miniconda3/envs/wisdem/lib/python3.10/site-packages/wombat/core/simulation_api.py:326, in Simulation.run(self, until, create_metrics, save_metrics_inputs)
    324     self.save_metrics_inputs()
    325 if create_metrics:
--> 326     self.initialize_metrics()

File ~/miniconda3/envs/wisdem/lib/python3.10/site-packages/wombat/core/simulation_api.py:330, in Simulation.initialize_metrics(self)
    328 def initialize_metrics(self) -> None:
    329     """Instantiates the ``metrics`` attribute after the simulation is run."""
--> 330     events = self.env.load_events_log_dataframe()
    331     operations = self.env.load_operations_log_dataframe()
    332     power_potential, power_production = self.env.power_production_potential_to_csv(
    333         windfarm=self.windfarm, operations=operations, return_df=True
    334     )

File ~/miniconda3/envs/wisdem/lib/python3.10/site-packages/wombat/core/environment.py:679, in WombatEnvironment.load_events_log_dataframe(self)
    671 convert_options = pa.csv.ConvertOptions(
    672     timestamp_parsers=["%Y-%m-%d %H:%M:%S.%f", "%Y-%m-%d %H:%M:%S"]
    673 )
    674 parse_options = pa.csv.ParseOptions(delimiter="|")
    675 log_df = pa.csv.read_csv(
    676     self.events_log_fname,
    677     convert_options=convert_options,
    678     parse_options=parse_options,
--> 679 ).to_pandas()
    680 if not pd.api.types.is_datetime64_any_dtype(log_df.datetime):
    681     log_df.datetime = pd.to_datetime(
    682         log_df.datetime, yearfirst=True, format="mixed"
    683     )

File ~/miniconda3/envs/wisdem/lib/python3.10/site-packages/pyarrow/array.pxi:837, in pyarrow.lib._PandasConvertible.to_pandas()

File ~/miniconda3/envs/wisdem/lib/python3.10/site-packages/pyarrow/table.pxi:4114, in pyarrow.lib.Table._to_pandas()

File ~/miniconda3/envs/wisdem/lib/python3.10/site-packages/pyarrow/pandas_compat.py:819, in table_to_blockmanager(options, table, categories, ignore_metadata, types_mapper)
    816     ext_columns_dtypes = _get_extension_dtypes(table, [], types_mapper)
    818 _check_data_column_metadata_consistency(all_columns)
--> 819 columns = _deserialize_column_index(table, all_columns, column_indexes)
    820 blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
    822 axes = [columns, index]

File ~/miniconda3/envs/wisdem/lib/python3.10/site-packages/pyarrow/pandas_compat.py:938, in _deserialize_column_index(block_table, all_columns, column_indexes)
    935     columns = _reconstruct_columns_from_metadata(columns, column_indexes)
    937 # ARROW-1751: flatten a single level column MultiIndex for pandas 0.21.0
--> 938 columns = _flatten_single_level_multiindex(columns)
    940 return columns

File ~/miniconda3/envs/wisdem/lib/python3.10/site-packages/pyarrow/pandas_compat.py:1184, in _flatten_single_level_multiindex(index)
   1182     # Cheaply check that we do not somehow have duplicate column names
   1183     if not index.is_unique:
-> 1184         raise ValueError('Found non-unique column index')
   1186     return pd.Index(
   1187         [levels[_label] if _label != -1 else None for _label in labels],
   1188         dtype=dtype,
   1189         name=index.names[0]
   1190     )
   1191 return index

ValueError: Found non-unique column index

First 4 lines of the event log file (note no column names)

2023-06-20 21:56:06.625773|2003-01-01 00:00:00|0|S00T1|subassemblies created: ['turbine']|windfarm initialization|initialization|S00T1|S00T1|||1.0|1|0|0|na|na|0|0|0|0|0|0
2023-06-20 21:56:06.657014|2003-01-01 00:00:00|0|S00T2|subassemblies created: ['turbine']|windfarm initialization|initialization|S00T2|S00T2|||1.0|1|0|0|na|na|0|0|0|0|0|0
2023-06-20 21:56:06.685828|2003-01-01 00:00:00|0|S00T3|subassemblies created: ['turbine']|windfarm initialization|initialization|S00T3|S00T3|||1.0|1|0|0|na|na|0|0|0|0|0|0
RHammond2 commented 1 year ago

Hi @johnwang0576, not sure how I missed this, but in any case thanks for opening this up!

This is something that I've seen a couple of times, but stems from modifying the example workflow. In particular, sim.env.cleanup_log_files() deletes the initialized operations and events logging files, and therefore the headers contained inside. I recommend moving this to either the end of your analysis workflow, or removing it altogether if you plan to inspect the files later because sim.run() is recreating and writing to those files that have just been deleted.

Bellow, is what your example should look like to operate correctly. I'll be sure to update the documentation to be more clear about when users should and shouldn't run sim.env.cleanup_log_files().

from time import perf_counter
from wombat import Simulation
from wombat.core.library import load_yaml, DINWOODIE

library_path = DINWOODIE
config = load_yaml(library_path / "project/config", "base.yaml")
sim = Simulation.from_config(config)

start = perf_counter()
sim.run()
end = perf_counter()
timing = end - start
print(f"Run time: {timing / 60:,.2f} minutes")

sim.env.cleanup_log_files()  #  delete the events and operations CSV files