Hal-Systems-AU / opennem

Energy market data access platform
https://opennem.org.au
MIT License
0 stars 0 forks source link

Rooftop solar doesn't backfill via mms #4

Open kahn-jms opened 4 months ago

kahn-jms commented 4 months ago

It looks like rooftop solar isn't backfilled via mms.py

In the current files the MEASUREMENT and SATELLITE data is kept in separate files (see https://nemweb.com.au/Reports/ARCHIVE/ROOFTOP_PV/ACTUAL/). In MMS archives they're combined and the TYPE column used to mark which is which (see https://nemweb.com.au/Data_Archive/Wholesale_Electricity/MMSDM/2023/MMSDM_2023_02/MMSDM_Historical_Data_SQLLoader/DATA/). This means we end up with duplicates for every timestep.

It looks like I can simply add another MMS crawler in mms.py but I'll need to modify the file handling to somehow ignore satellite type? It looks like the parsing is done in process_rooftop_actual in opennem/controllers/nem.py#L467, where it passes the MMS records to generate_facility_scada in opennem/controllers/nem.py#L87. This handles duplicates by simply keeping the last.

I think what we actually want is to keep the MEASUREMENT samples only. There's also a QI column which is the Quality Indicator of the estimate (see the MMS model report). So I think ideally we want to keep the duplicate row with the highest QI value, and then if there are still duplicates keep the last.

kahn-jms commented 4 months ago

Going to start by letting it keep the last in duplicate rows for now and just accept the error. It'll need another PR/issue for cleaning up duplicate handling.