Add info if datasets are not freshly loaded

jensch-dlr commented 1 year ago

Hello everyone,

I cannot get the pm.powerplants(update=True) to run. I guess something is wrong with my config, but I cannot seem to find out what.

AttributeError: 'DataFrame' object has no attribute 'Name' when calling pm.powerplants(update=True)

Has anyone encountered that mistake before or knows how to circumvent it by chance?

FabianHofmann commented 1 year ago

Hey @jensch-dlr thanks for reporting. Could you print out the full stack trace? And what pandas version you use?

jensch-dlr commented 1 year ago

Hello @FabianHofmann , I sure can. Here is the whole thing:

`--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[4], line 1 ----> 1 pm.powerplants(update=True)

File c:\work\data\powerplantmatching\powerplantmatching\collection.py:230, in powerplants(config, config_update, update, from_url, extend_by_vres, extendby_kwargs, extend_by_kwargs, fill_geopositions, filter_missing_geopositions, collection_kwargs) 225 return df 227 matching_sources = [ 228 list(to_dict_if_string(a))[0] for a in config["matching_sources"] 229 ] --> 230 matched = collect(matching_sources, config=config, collection_kwargs) 232 if isinstance(config["fully_included_sources"], list): 233 for source in config["fully_included_sources"]:

File c:\work\data\powerplantmatching\powerplantmatching\collection.py:98, in collect(datasets, update, reduced, config, dukeargs) 95 update = True 97 if update: ---> 98 dfs = parmap(df_by_name, datasets) 99 matched = combine_multiple_datasets(dfs, datasets, config=config, dukeargs) 100 ( 101 matched.assign(projectID=lambda df: df.projectID.astype(str)).to_csv( 102 outfn_matched, index_label="id" 103 ) 104 )

File c:\work\data\powerplantmatching\powerplantmatching\utils.py:378, in parmap(f, arg_list, config) 376 return [x for i, x in sorted(res)] 377 else: --> 378 return list(map(f, arg_list))

File c:\work\data\powerplantmatching\powerplantmatching\collection.py:73, in collect..df_by_name(name) 71 conf = config[name] 72 get_df = getattr(data, name) ---> 73 df = get_df(config=config) 75 if not conf.get("aggregated_units", False): 76 return aggregate_units(df, dataset_name=name, config=config)

File c:\work\data\powerplantmatching\powerplantmatching\data.py:751, in ENTSOE(raw, update, config, entsoe_token, fill_geoposition_kwargs) 743 fn = _package_data("entsoe_country_codes.csv") 744 COUNTRY_MAP = pd.read_csv(fn, index_col=0).rename(index=str).Country 746 return ( 747 df.rename_axis(index="projectID") 748 .reset_index() 749 .rename(columns=RENAME_COLUMNS) 750 .dropduplicates("projectID") --> 751 .assign( 752 Name=lambda df: df.Name.str.replace("", " "), # for geoparsing 753 EIC=lambda df: df.projectID, 754 Country=lambda df: df.projectID.str[:2].map(COUNTRY_MAP), 755 Capacity=lambda df: pd.to_numeric(df.Capacity), 756 Technology=np.nan, 757 Set=np.nan, 758 lat=np.nan, 759 lon=np.nan, 760 ) 761 .powerplant.convert_alpha2_to_country() 762 # .pipe(fill_geoposition, fill_geoposition_kwargs) 763 .query("Capacity > 0") 764 .pipe(gather_specifications, config=config) 765 .pipe(clean_name) 766 .pipe(set_column_name, "ENTSOE") 767 .pipe(config_filter, config) 768 )

File C:\mambaforge-data\envs\powerplantmatching\Lib\site-packages\pandas\core\frame.py:4889, in DataFrame.assign(self, **kwargs) 4886 data = self.copy() 4888 for k, v in kwargs.items(): -> 4889 data[k] = com.apply_if_callable(v, data) 4890 return data

File C:\mambaforge-data\envs\powerplantmatching\Lib\site-packages\pandas\core\common.py:374, in apply_if_callable(maybe_callable, obj, kwargs) 363 """ 364 Evaluate possibly callable input using obj and kwargs if it is callable, 365 otherwise return as it is. (...) 371 kwargs 372 """ 373 if callable(maybe_callable): --> 374 return maybe_callable(obj, **kwargs) 376 return maybe_callable

File c:\work\data\powerplantmatching\powerplantmatching\data.py:752, in ENTSOE..(df) 743 fn = _package_data("entsoe_country_codes.csv") 744 COUNTRY_MAP = pd.read_csv(fn, index_col=0).rename(index=str).Country 746 return ( 747 df.rename_axis(index="projectID") 748 .reset_index() 749 .rename(columns=RENAME_COLUMNS) 750 .dropduplicates("projectID") 751 .assign( --> 752 Name=lambda df: df.Name.str.replace("", " "), # for geoparsing 753 EIC=lambda df: df.projectID, 754 Country=lambda df: df.projectID.str[:2].map(COUNTRY_MAP), 755 Capacity=lambda df: pd.to_numeric(df.Capacity), 756 Technology=np.nan, 757 Set=np.nan, 758 lat=np.nan, 759 lon=np.nan, 760 ) 761 .powerplant.convert_alpha2_to_country() 762 # .pipe(fill_geoposition, **fill_geoposition_kwargs) 763 .query("Capacity > 0") 764 .pipe(gather_specifications, config=config) 765 .pipe(clean_name) 766 .pipe(set_column_name, "ENTSOE") 767 .pipe(config_filter, config) 768 )

File C:\mambaforge-data\envs\powerplantmatching\Lib\site-packages\pandas\core\generic.py:5902, in NDFrame.getattr(self, name) 5895 if ( 5896 name not in self._internal_names_set 5897 and name not in self._metadata 5898 and name not in self._accessors 5899 and self._info_axis._can_hold_identifiers_and_holds_name(name) 5900 ): 5901 return self[name] -> 5902 return object.getattribute(self, name)

AttributeError: 'DataFrame' object has no attribute 'Name'`

The Pandas version I used is 1.5.3.

By now, I think I know what causes the error. None of the PPM datasets is downloaded to my drive (at least I cannot find any). So the true mistake seems to lie there. So, my new question: Shouldn't that download happen when I use the powerplantmatching package? Like when I use pm.powerplants(from_url=False)?

Thank you!

jensch-dlr commented 1 year ago

Okay, I found the problem. I had a version of entsoe_powerplants.csv in my directory that was from mid-2021. Unlike now, its column titles were ,Unnamed: 0,registeredResource.name,registeredResource.mRID,voltage_PowerSystemResources.highVoltageLimit,psrType,quantity,Country (now: ,Bidding Zone,Installed Capacity [MW],Name,Production Type,Voltage Connection Level [kV]).

That was the reason for the error message. So, I am guessing that currently PPM is only checking if a file with the same name already exists in the in data directory. Wouldn't it be better to check if it is actually the same file and update it if not?

FabianHofmann commented 1 year ago

You're probably right. But this would actually require to go deep into the code. Perhaps a reset option would be better. Just in case one wants to make a fresh install. That would have solved the problem for you right?

jensch-dlr commented 1 year ago

It would have, yes. Maybe a warning with a hint on that possible problem would have speeded up the process. But that might again be too hard to integrate?

PyPSA / powerplantmatching

Add info if datasets are not freshly loaded #104