It looks like I can simply add another MMS crawler in mms.py but I'll need to modify the file handling to somehow ignore satellite type?
It looks like the parsing is done in process_rooftop_actual in opennem/controllers/nem.py#L467, where it passes the MMS records to generate_facility_scada in opennem/controllers/nem.py#L87.
This handles duplicates by simply keeping the last.
I think what we actually want is to keep the MEASUREMENT samples only.
There's also a QI column which is the Quality Indicator of the estimate (see the MMS model report).
So I think ideally we want to keep the duplicate row with the highest QI value, and then if there are still duplicates keep the last.
Going to start by letting it keep the last in duplicate rows for now and just accept the error.
It'll need another PR/issue for cleaning up duplicate handling.
It looks like rooftop solar isn't backfilled via
mms.py
In the current files the MEASUREMENT and SATELLITE data is kept in separate files (see https://nemweb.com.au/Reports/ARCHIVE/ROOFTOP_PV/ACTUAL/). In MMS archives they're combined and the
TYPE
column used to mark which is which (see https://nemweb.com.au/Data_Archive/Wholesale_Electricity/MMSDM/2023/MMSDM_2023_02/MMSDM_Historical_Data_SQLLoader/DATA/). This means we end up with duplicates for every timestep.It looks like I can simply add another MMS crawler in
mms.py
but I'll need to modify the file handling to somehow ignore satellite type? It looks like the parsing is done inprocess_rooftop_actual
inopennem/controllers/nem.py#L467
, where it passes the MMS records togenerate_facility_scada
inopennem/controllers/nem.py#L87
. This handles duplicates by simply keeping the last.I think what we actually want is to keep the MEASUREMENT samples only. There's also a
QI
column which is the Quality Indicator of the estimate (see the MMS model report). So I think ideally we want to keep the duplicate row with the highest QI value, and then if there are still duplicates keep the last.