catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
MIT License
468 stars 107 forks source link

Verify licensing of GridPath RA Toolkit inputs #3490

Closed zaneselvans closed 5 months ago

zaneselvans commented 5 months ago

Verify the licensing associated with all known inputs to the GridPath RA Toolkit data and identify those data which we can safely archive and redistribute under CC-BY-4.0.

Appendices in the notes below refer to the GridPath RA Toolkit report. Notes come from the README file associated with the published GridPath RA Toolkit data.

✅ Hourly Wind Profiles contains hourly simulated wind capacity factor data by project between 2007 and 2014, based on wind speed data from NREL's Wind Toolkit and empirically-derived power curves. Each file corresponds to a project from EIA Form 860: [Plant ID]_capfactor.csv. Note that the hour ending or "HE" time stamp column is missing, but the 24 hours of data corresponding to each day represents HE 1 through HE 24 of that day in Pacific Standard Time. For more information about how this data was developed and used in the study, see Appendix A.4.

Note that the wind generation profiles under the Monte Carlo Inputs directory in the GridPath RA Toolkit are derived from these inputs as well.

✅ Hourly Solar Profiles contains hourly simulated solar capacity factor data by project between 1998 and 2019, based on data from the NSRDB and NREL's SAM model. Each file corresponds to a project from EIA Form 860: [Plant ID]_[Generator ID].csv. Timestamps are in UTC. For more information about how this data was developed and used in the study, see Appendix A.5.

Note that the solar generation profiles under the Monte Carlo Inputs directory in the GridPath RA Toolkit are derived from these inputs as well.

✅ Hourly Thermal Generator Derates contains hourly estimated thermal temperature derates by generator between 1998 and 2019, based on temperature data from the NSRDB and project-specific piece-wise linear derate functions. Each file corresponds to a project from EIA Form 860: [Plant ID]_[GeneratorID].csv. Timestamps are contained in timestamps.csv and are listed in hour ending, Pacific Standard Time. For more information about how this data was developed and used in the study, see Appendix A.2.

In Appendix A.2 of the GridPath RA Toolkit report the method for deriving the hourly thermal derates is described:

To derive this piecewise linear function, we examined the hourly load shape of the BA to which the unit was assigned to identify the “peak hours” in the winter and summer, the four hours of the day in each season with the highest average load, over which net capacity testing may have occurred.

The hourly load shape of the BAs is discussed below, and depends on the WECC 2026 Common Case data. But it's being used incidentally here to identify the peak hours that should be used to define the linear derate functions.

✅ Weather Data

DailyWeatherData_cleaned.csv: daily weather data from 16 locations in the West between 1948 and 2021. For more information, see Appendix E of the report.

❌ Hydro Data

MonthlyHydro_byPlant.csv: monthly hydro energy by plant from EIA Form 923/906 between 2001 and 2020, listed by EIA Plant ID and EIA Plant Name. For more information about how this data was used in the study, see Appendix A.3.

❌ Hourly Load Profiles contains hourly load data between 2006 and 2020 from FERC Form 714, which was used to develop the load shapes in the Western RA Case Study. Each file corresponds to a FERC respondent. In each file, the columns are: year, month, day, hour ending (Pacific Standard Time), load (MW). This data has been cleaned for use in this study, including making manual adjustments for missing or bad data. For more information about how this data was used in the study, see Appendix A.1.

zaneselvans commented 5 months ago

After talking to Ana and Elaine in our checkin today, it sounds like the thermal generator derates can actually be licensed under CC-BY-4.0 as the WECC data does not flow into them directly, and they have the same structure as the renewable generator capacity factors.