When deep=True, data is copied but actual Python objects will not be copied recursively, only the reference to the object. This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data (see examples below
So it could be more memory-efficient.
According to https://stackoverflow.com/questions/9058305/getting-attributes-of-a-class , a possible suggestion for implementation could be gather the attributes except for _df attribute (that will be copied with pd.copy()) and create a new instance with the copy.deepcopy() of those attributes (not _df):
for attribute, value in self.__dict__.items():
print(attribute, '=', value)
copied_d = Dataset()
copied_d.__class__.__dict__ = {key: copy.deepcopy(value) for (key, value) in original_d.__class__.__dict__.items() if key != '_df'}
copied_d._df = original_d._df.copy()
My idea would be to use Pandas for the copy of the DataFrame (which is the biggest object in memory for the instance).
Looking at the docs they say (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.copy.html):
So it could be more memory-efficient. According to https://stackoverflow.com/questions/9058305/getting-attributes-of-a-class , a possible suggestion for implementation could be gather the attributes except for
_df
attribute (that will be copied withpd.copy()
) and create a new instance with thecopy.deepcopy()
of those attributes (not_df
):What do you think? Does it make sense?
_Originally posted by @lorenz-gorini in https://github.com/HK3-Lab-Team/pytrousse/pull/67#discussion_r493494867_