electricitymaps / electricitymaps-contrib

A real-time visualisation of the CO2 emissions of electricity consumption
https://app.electricitymaps.com
GNU Affero General Public License v3.0
3.48k stars 925 forks source link

Investigate switching data source for Ireland #2339

Closed Lobidu closed 1 year ago

Lobidu commented 4 years ago

Hi, the data for Ireland has been missing for a while now. As it has been working before, is there anything broken that I could help resolve? Thanks Janis

AyrtonB commented 4 years ago

I believe that Ireland is currently using ENTSOE data, bizarrely ENTSOE has no data for Ireland the country and only provides information on the SEM market which became ISEM back in 2018.

Northern Ireland & the Republic of Ireland are both part of the same electricity market so it also doesn't make sense that the map currently shows Northern Ireland only.

There are two sites we could get Irish data from apart from ENTSOE, the official route through https://www.sem-o.com/ or scraping the dashboard from http://smartgriddashboard.eirgrid.com/. The EirGrid dashboard is at 15min res whilst SEM-O is at 30mins.

AyrtonB commented 4 years ago
""" Caller """
class Caller(object):
    ## Core Query Caller
    def query_API(self, start_date:datetime, end_date:datetime, area:str, region:str):## Formatting & Forming Query
        ## Formatting & Checking Parameters
        start_date, end_date = self.format_date_inputs(start_date, end_date) # Format datetimes

        ## Checking stream is an EI endpoint
        assert area in self.possible_areas, f"Area must be one of {''.join([stream+', ' for stream in self.possible_areas])[:-2]}"
        assert region in self.possible_regions, f"Region must be one of {''.join([time_group+', ' for time_group in self.possible_regions])[:-2]}"

        ## Creating Query
        query_params_dict = self.create_query_params_dict(area, region, start_date, end_date) # create dictionary of parameters and their values

        if area == 'marketdata':
            data_type = 'marketdata'
        else:
            data_type = 'data'

        query = self.params_dict_2_query(query_params_dict, data_type=data_type) # create the url query string

        ## Getting & Parsing Response
        response = requests.get(query)
        r_json = response.json()

        ## Checking Response is OK
        response_metadata = self.get_metadata(r_json)
        error = response_metadata['error_message']

        assert isinstance(error, type(None)), f'Error with API call: {error}'
        assert response_metadata['num_rows'] > 0, 'No rows were returned'

        return r_json

    ## Helper Functions
    def format_date_inputs(self, start_date, end_date):
        format_dt = lambda dt: datetime.strftime(dt, '%d-%b-%Y+%H:%M').replace(':', '%3A') if isinstance(dt, date) else dt

        start_date = format_dt(start_date)
        end_date = format_dt(end_date)

        return start_date, end_date

    def params_dict_2_query(self, query_params_dict, data_type='data'):
        query_root = f'http://smartgriddashboard.eirgrid.com/DashboardService.svc/{data_type}?'
        query_params = ''

        for query_param in query_params_dict.keys():
            query_param_val = query_params_dict[query_param]
            query_params += f'&{query_param}={query_param_val}'

        query = query_root + query_params
        return query

    def create_query_params_dict(self, area, region, start_date, end_date):
        query_params_dict = dict()

        query_params_dict['area'] = area
        query_params_dict['region'] = region
        query_params_dict['datefrom'] = start_date
        query_params_dict['dateto'] = end_date

        return query_params_dict # qt5khWpnGqqs

    def get_metadata(self, r_json):
        response_metadata = dict()

        r_json_keys = pd.Series(list(r_json.keys()))
        exception_occured = r_json_keys.str.contains('Exception').sum()>0

        if exception_occured:
            exception_type = r_json['ExceptionType']
            exception = r_json['ExceptionDetail']
            warnings.warn(f'{exception_type}: {exception}')

        response_metadata['error_message'] = r_json['ErrorMessage']
        response_metadata['last_updates'] = r_json['LastUpdated']
        response_metadata['num_rows'] = len(r_json['Rows'])

        return response_metadata

    def call_2_df(self, start_date, end_date, area, region):
        r_json = self.query_API(start_date, end_date, area, region) # Making Call
        df_raw = pd.DataFrame(r_json['Rows']) # Creating DataFrame

        return df_raw

    def query_months_years(self, years, months, area, region):
        df_raw = pd.DataFrame()

        for year in years:
            for month in months:
                #try:
                start_date = datetime(year, month, 1, 0, 0)

                if month == 12:
                    end_date = datetime(year, month, 31, 23, 45)
                else:
                    end_date = datetime(year, month+1, 1, 0, 0) - timedelta(minutes=15)

                df_raw_month = caller.call_2_df(start_date, end_date, area, region)
                df_raw = df_raw.append(df_raw_month)
            #except:
             #   warnings.warn(f'It was not possible to scrape {month}-{year}')

        return df_raw

    ## Initialiser
    def __init__(self):
        self.possible_areas = ['interconnection', 'windforecast', 'windactual', 'generationactual', 'marketdata']
        self.possible_regions = ['NI', 'ALL']

## User Inputs
start_date = datetime(2019, 5, 1)
end_date = datetime(2019, 5, 31, 23, 45)
area = 'generationactual'
region = 'NI'

## Making Call
caller = Caller()
df_raw = caller.call_2_df(start_date, end_date, area, region)

df_raw.head()
EffectiveTime FieldName Region Value
0 01-May-2019 00:00:00 GEN_EXP NI 586
1 01-May-2019 00:15:00 GEN_EXP NI 566
2 01-May-2019 00:30:00 GEN_EXP NI 557
3 01-May-2019 00:45:00 GEN_EXP NI 540
4 01-May-2019 01:00:00 GEN_EXP NI 529
AyrtonB commented 4 years ago

Looks like EirGrid dashboard is down for the moment though too

AyrtonB commented 4 years ago

It does appear as though SEMO is continuing to update their data

def form_url(data_stream, date_start, date_end):
    date_start = date_2_date_input(date_start)
    date_end = date_2_date_input(date_end, date_type='end')

    url_root = 'https://reports.sem-o.com/api/v1/dynamic/'
    url = f'{url_root}{data_stream}?StartTime={date_start}&EndTime={date_end}&sort_by=StartTime&order_by=ASC&Jurisdiction=All&ParticipantName=&ResourceName=&ResourceType=&page_size=1000'

    return url

def url_2_df(url):
    response = requests.get(url)
    dict_response = response.json()
    df_response = pd.DataFrame(dict_response['items'])
    return df_response

def convert_dt_cols(df, date_cols=['EndTime', 'StartTime']):
    if not isinstance(date_cols, list):
        date_cols = [date_cols]

    for date_col in date_cols:
        df[date_col] = pd.to_datetime(df[date_col])

    return df

data_stream = 'BM-098'
date_start = date(2020, 3,1)
date_end = date(2020, 3, 2)

url = form_url(data_stream, date_start, date_end)

df = url_2_df(url)
df = convert_dt_cols(df)

df.head()
TradeDate DeliveryDate StartTime EndTime MW
0 2020-03-03 2020-03-02 2020-03-02 23:30:00 2020-03-03 00:00:00 3743.8
1 2020-03-03 2020-03-03 2020-03-03 00:00:00 2020-03-03 00:30:00 3568.5
2 2020-03-03 2020-03-03 2020-03-03 00:30:00 2020-03-03 01:00:00 3597.6
3 2020-03-03 2020-03-03 2020-03-03 00:30:00 2020-03-03 01:00:00 3597.6
4 2020-03-03 2020-03-03 2020-03-03 01:00:00 2020-03-03 01:30:00 3628.1
robertahunt commented 4 years ago

Hmmmm it is unfortunate that ENTSOE is missing irish data at the moment. It looks like it has been up and down with ireland for a while (I see a few related closed issues like: https://github.com/tmrowco/electricitymap-contrib/issues/2037)

The parser we use for northern ireland uses data from http://www.soni.ltd.uk/ and it looks like they have data for the republic of ireland as well. If someone has time to adapt the GB_NIR parser to work with IR, we could switch.

Based on @AyrtonB 's comment above, we could also explore getting the data from SEM-O. I wonder where http://www.soni.ltd.uk/ gets there data from then, since they have IE and GB-NIR seperated.

AyrtonB commented 4 years ago

The SONI data is the same as the EIRGrid Dashboard, even down to the API requests.

The first script I posted can easily collect the full IR data, in this df_raw = caller.call_2_df(start_date, end_date, area, region) the region specified would be 'ALL' instead of 'NI'.

Whereabouts is the GB_NIR parser, I'm happy to modify it

robertahunt commented 4 years ago

That's great thanks! I am okay with switching to either datasource (whichever seems more stable in your opinion). I think we should keep Northern Ireland separate from the Republic of Ireland though, so we can show people the highest level of granularity possible.

The NIR parser is here: parsers/GB_NIR.py

pierresegonne commented 4 years ago

Hey @AyrtonB any news on the parser? :)

Kongkille commented 2 years ago

There's been a lot of back and forth. There are some limitations in the Eirgrid data source, but since Entsoe is often down, it might be a suitable alternative.

@pierresegonne & @gwpicard what do you suggest we do? Do we want to try to switch the data source for IE to Eirgrid or keep Entsoe as is?

gwpicard commented 2 years ago

Given that we have estimations, it might be worth switching to Eirgrid data as the breakdown seems good. The only issue is that I don't know if you can fetch historical data.

We would also need to investigate a bit whether the Eirgrid data matches up vs. official sources.

It sounds like quite a lot of heavy lifting, I say we stick to ENTSO-E for now unless someone carries out a bit of an investigation or we need to develop an estimation model of IE

daviessm commented 2 years ago

Can we look at switching to Eirgrid (or SONI) for Northern Ireland at least as that data's been missing completely for a while now and isn't available on ENTSO-E?

pierresegonne commented 2 years ago

Hey @daviessm! Thanks for picking this up.

That sounds reasonable now given the time it's been down.

We would need to look into:

  1. Can we refetch historical data? In general we try to only keep a single data source for real time and historical data to not introduce inconsistencies.
  2. Does the data we see from Eirgrid matches other sources? Simple aggregates over e.g a year or a month compared to national statistics is enough.

Let me know if you need pointers for this :)

gwpicard commented 2 years ago

From what I'm seeing, the data you can get from the EirGrid data portal (which include both IE and GB-NIR) includes historical data (accessible from a dropdown). However, there is no hourly historical production breakdown data—only hourly total system demand, wind generation, and interconnection.

https://www.eirgridgroup.com/how-the-grid-works/system-information/

daviessm commented 2 years ago

@pierresegonne I wasn't necessarily volunteering to pick this up myself if anyone wants to jump in (it looked like a few others like @wizmer had started) but if I get time in the next few weeks I'll try and figure it all out.

the-red-lily commented 2 years ago

https://www.sem-o.com/market-data/dynamic-reports/#BM-086 only shows total generation data for the last 3 months. I think for historical data we'd need to use the dropdown @gwpicard suggested.

I think once the historical data is downloaded, so long as the historical data matches the realtime data, we can avoid using the dropdown workaround and aggregate historical data ourselves.

Note: https://www.eirgridgroup.com/how-the-grid-works/system-information/ also has their own estimate for co2 intensity. I think if we combine that metric with their wind generation and total generation data, we could estimate the remaining coal/gas proportions couldn't we? So long as we have a decent guess on how they calculate co2 intensity. We could also blindly trust their number 😅 Their CO2 intensity seems to be higher than the number currently shown on electricitymap

the-red-lily commented 2 years ago

If no one else is doing it, I could try and compare the historical data from eirgridgroup with other sources. Give me till the end of the week

the-red-lily commented 2 years ago

FYI, the fuel mix returned from this API (used in df1b3fe) does not display historical data. But it does include a breakdown between coal, gas, renewable, import, and for ROI "other fossil". It may display live data, or an aggregate (maybe 24 hrs? not sure) https://www.smartgriddashboard.com/DashboardService.svc/data?area=fuelmix&region=ROI&datefrom=01-01-2021+00%3a00&dateto=01-01-2021+00%3a15

The graph data can be grabbed from the API here (though since it isn't "public" i'd recommend self-throttling"). Replace dates and region with "ROI" or "NI" Wind: https://smartgriddashboard.com/DashboardService.svc/data?area=windactual&region=ALL&datefrom=22-Aug-2022+00%3A00&dateto=22-Aug-2022+23%3A59 Total Generation: https://smartgriddashboard.com/DashboardService.svc/data?area=generationactual&region=ALL&datefrom=22-Aug-2022+00%3A00&dateto=22-Aug-2022+23%3A59 Total Demand: https://smartgriddashboard.com/DashboardService.svc/data?area=demandactual&region=ALL&datefrom=22-Aug-2022+00%3A00&dateto=22-Aug-2022+23%3A59 Interconnection: https://smartgriddashboard.com/DashboardService.svc/data?area=interconnection&region=ALL&datefrom=22-Aug-2022+00%3A00&dateto=22-Aug-2022+23%3A59 CO2 Intensity: https://smartgriddashboard.com/DashboardService.svc/data?area=co2intensity&region=ALL&datefrom=22-Aug-2022+00%3A00&dateto=22-Aug-2022+23%3A59 Total CO2: https://smartgriddashboard.com/DashboardService.svc/data?area=co2emission&region=ALL&datefrom=22-Aug-2022+00%3A00&dateto=22-Aug-2022+23%3A59

These can be cross-referenced with historical data. Energy in Ireland 2019 has an annual breakdown of electricity by source. Though the units are based on oil-equivalents, we could figure out the proportions.

gwpicard commented 2 years ago

@the-red-lily thanks for doing some of the investigation here. Haven't had a chance to take a look at it, but just want to drop an FYI here to say we want to avoid using external CO2 values because then we start affecting the standardisation in our approach to computing these numbers

pierresegonne commented 2 years ago

Hey @the-red-lily thanks for the thorough analysis!

https://www.sem-o.com/market-data/dynamic-reports/#BM-086 only shows total generation data for the last 3 months. I think for historical data we'd need to use the dropdown @gwpicard suggested.

If we can only get real time total production in real time and historical production mixes and that the overall total production matches, we could think of building an estimation model to make sure that we stop having so many outages for IE. It would thus be super useful to know more :)

pierresegonne commented 1 year ago

@mathilde-daugy should we close?