deep-learning-with-pytorch / dlwpt-code

Code for the book Deep Learning with PyTorch by Eli Stevens, Luca Antiga, and Thomas Viehmann.
https://www.manning.com/books/deep-learning-with-pytorch
4.69k stars 1.99k forks source link

Misleading code [3.8.2] #30

Open OverLordGoldDragon opened 3 years ago

OverLordGoldDragon commented 3 years ago

We can easily verify that the two tensors share the same storage id(points.storage()) == id(points_t.storage())

is should be used, which returns False; try id(float(3)) == id(float(4)). See here.

t-vi commented 3 years ago

Hi. You are absolutely right that there is a problem here, tank you! The issue is that there isn't a 1-1 map between the C++ storage (which would be the same) and the Python Storage objects (which are recreated with every storage call). So if anything is True here, it is due to the incorrect use of ==. Incidentally, as far as I know, the longer term idea seems to be that PyTorch doesn't use a storage class at all (but instead has views of a tensor), but it's not a priority.

OverLordGoldDragon commented 3 years ago

@t-vi The main point that two tensors share the same Storage is correct, but it's also important to not confuse that with identical Python id's; I think the existing examples of editing a storage and affecting both tensors do well enough to explain.

Otherwise a fine textbook you've made available freely, thank you.

OverLordGoldDragon commented 3 years ago

@t-vi Also misleading statement in 5.5.2:

image

This only holds for the baby example and completely breaks down for larger models, so to not underscore importance of input normalization maybe add "... blink (in this simple case):"

t-vi commented 3 years ago

Well, so the key takeaway we wanted to have here is that Adam will automatically "equilibrate" the step size between the parameters while SGD will need you to do this manually. We could have made it clearer that it isn't universal that you can use this learning rate, but then we're deeply focusing on this example here.

OverLordGoldDragon commented 3 years ago

@t-vi Done reading, ended at ch12. Likely typos:

Some feedback on part 2: I'd say the data pipeline complexity was a bit overkill, namely on caching. There wasn't really a need when one can simply extract candidates/nodules into own files and be simpler and faster. Good to know how-to, but among all other things the reader is supposed to learn this may add redundant overhead. Otherwise, quality material - especially in part 1, gradually adding layers of complexity starting at w*x + b.

t-vi commented 3 years ago

Heya, yeah. Between the three of us, we've had mixed feelings about the caching, too, and the main reason to have it is to provide the perspective that such thing exists. Obviously, some of us also have inclinations towards more targeted data tooling, e.g. hangar (video). If you later read the remainder, I'd be keen to hear your feedback on the remainder, too, in particular ch14 and 15.

t-vi commented 3 years ago

And thank you for your very thorough and critical reading and sharing your comments. It is very interesting and, indeed, helpful.