Closed s-alexey closed 8 years ago
Current implementation looks quite strange
I agree.
I don't know what is the best solution.
I agree.
Does anything break?
Current Text-Mining add-on implementation use it in Corpus class and it leads to some problems there.
Indeed. Do Orange's unit tests (python ./setup.py test
) still pass? If they do, please submit a pull request.
Although it sounds reasonable and obvious, I'd be careful with this. @astaric and @lanzagar have to confirm that this change has no side effects. I may be the original author of this code (or perhaps it's @astaric), but I can't remember why it's written like this. It can be a blunder, or it may be intentional.
Do numpy array implement deep copy efficiently? (I just quickly checked - it seems that they have a __deepcopy__
method.)
I'm not sure whether deepcopy is the best solution.
t = self.__class__(self)
here we create a new object with self as a second argument. It's dangerous.
Here a simple example of problems that may occur:
class Foo(object):
def __init__(self, a, b):
self.a = a
if isinstance(a, Foo):
raise ValueError("a can't be Foo inctance")
self.b = b
# in additional it may take a lot of time
self.c = hard_computation_problem(a, b)
def copy(self):
# here a = self, b is not provided
# so it raises a TypeError
c = self.__class__(self)
c.a = copy(self.a)
c.b = copy(self.b)
return c
class Bar(Foo):
def __init__(self, a, b=2):
# here b is present, but isinstance(a, Foo) == True
super(Bar, self).__init__(a, b)
and this code raises TypeError
>>> obj = Foo(1, 2)
>>> c = obj.copy()
or ValueError:
>>> obj = Bar(1)
>>> c = obj.copy()
You're right that self.__class__(self)
can trigger errors if __init__
does not accept self
as an argument. But Table.__init__
does.
I thought about it some more. :) When an object copies itself - with either deep or shallow copy - it will have to call the constructor in some way. copy.deepcopy(self)
does the same thing as copy
does now: it calls the constructor (probably through pickling?) and then deepcopies the attributes (I hope). Or lets the pickling do the job. Replacing the current code with deepcopy
just makes it less efficient.
The method copy
is essentially __deepcopy__
, just without the memo. We can implement __deepcopy__
(it will look like the current copy
) and __copy__
(similar, but without ensure_copy
), and then copy
can call __deepcopy__
.
If the current implementation is broken in derived classes, derived classes have to be fixed by improving the constructor and/or overloading __copy__
and __deepcopy__
.
Thank you, @janezd. I was a little confused, but I have figured it out now.
copy.deepcopy(self) does the same thing as copy does now
there is a difference: Table.copy
doesn't copy domain
attribute.
Finally, lets c = t.copy()
. Whether it is right that c.ids == t.ids
?
Oh, yes, and this is really important: the domain
shouldn't be deepcopied. Copying it wouldn't break anything, but it would imply that the new table does not come from the same domain and decrease the efficiency by adding unnecessary conversions.
I'm not sure about ids
, but I believe it's OK to have c.ids is t.ids
because we can consider them to be immutable.
I'm not sure about
ids
, but I believe it's OK to havec.ids is t.ids
because we can consider them to be immutable.
I used to think so, but ...
>>> from Orange.data import Table
>>> d = Table("iris")
>>> x = d.copy()
>>> x.ids is d.ids
False
>>> (x.ids == d.ids)[:5]
array([ True, True, True, True, True], dtype=bool)
>>> d.extend(x)
>>> d.ids[:5], d.ids[len(x):len(x)+5]
(array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4]))
>>>
>>> import numpy as np
>>> d = Table(np.zeros((5, 2)))
>>> x = d.copy()
>>> d.extend(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/alexey/orange/orange3/Orange/data/table.py", line 869, in extend
self._resize_all(old_length + len(instances))
File "/home/alexey/orange/orange3/Orange/data/table.py", line 630, in _resize_all
self.X.resize(new_length, self.X.shape[1])
ValueError: cannot resize this array: it does not own its data
You're right, ids
can't be shared, and copy
creates new ids
(through constructor).
The exception that you got is correct: Table
's constructor does not copy the data matrix and so it can't change it.
This reminds me: @astaric, @kernc, is there any place where we change Table
(and can't avoid it) or could we make it immutable?
We make tables immutable! I love you!We use Pandas. :+1:
@s-alexey Is the issue still pertinent? From what I know we decided to keep the current implementation and wait for pandas. If that's the case, I kindly invite someone with the authority to close the issue. :smile: If not, keep the magic going!
Current implementation looks quite strange:
self.__class__(self)
call runs init method with self as a second (typicaly X) argument. This isn't desired behavior.Possible solutions: use
copy
moduleor just change to
self.__class__.__init__(self)
I can make PL, but I don't know what is the best solution.