dvgodoy / PyTorchStepByStep

Official repository of my book: "Deep Learning with PyTorch Step-by-Step: A Beginner's Guide"
https://pytorchstepbystep.com
MIT License
866 stars 332 forks source link

Possibly some typos in the book #47

Open Patrick-Wen opened 7 months ago

Patrick-Wen commented 7 months ago

Here is Eq 5.6 in the book: Snipaste_2024-03-19_15-53-50

It is stated that "C stands for the number of classes". I think the Nc, which represents the number of cases in the cth class, should be replaced by C. Nc is simply irrelevant here since softmax is calucated per individual.

Please correct me if I am wrong. Thank you.

Patrick Wen

dvgodoy commented 7 months ago

Hi @Patrick-Wen,

Thank you for reporting this.

You're absolutely right, it should be "C-1" instead of "Nc-1" in the upper part of the summation. I will fix it in the notebook and add to the list of typos to be addressed in a future revision.

Thanks for supporting my work :-)

Best, Daniel

Patrick-Wen commented 6 months ago

As my reading of the book continues, I may wish to bring up some possible typos I figured out along the way.

  1. In Page 530 of the book establishing the class Inception, here I suspect 2@HxW might have been mistakenly put as 1@HxW: Snipaste_2024-03-27_09-51-15

  2. Page 695-6 put: "The hidden state, however, was entirely handled by the model itself using its hidden_state attribute", where hidden_state was printed in a different font from other texts, suggesting hidden_state is an attribute defined in the code. However, neither the Decoder class in Page 694 nor the for loop in Page 695 included hidden_state. If I understand correctly, hidden_state may need to be replaced with hidden, which is defined in Page 694 as follows: Snipaste_2024-03-27_09-51-15

As an anverage mind trying to learn PyTorch seriously through this book, may I express some of my personal views on several improvements:

  1. Here are two figures from Pages 594-5: Snipaste_2024-03-27_10-22-54 In Figure 8.6, h0 and h1 are two different hidden layers. However, In Figure 8.7, h0 and h1 are two dimensions of a single hidden layer. If this understanding is correct, I would say it is possbly somewhat misleading to have h0 and h1 with different meanings in two consecutive figures. For those acquainted with RNN, this should not be an issue at all. But for beginners of deep learning, I am not sure if I am the only one who would get confused at the first sight (although I was able to correct myself at a second thought).

  2. For several examples in Chater 8 and 9, the input features have a dimension of 2 while the dimension of hidden layers is also set to 2. While I was trying to understand the code, I tried to change the hidden layer dimension to some number other than 2. The resulting output gives me an better understanding of the shape of model input and output, which ultimately entails a better understanding of the model. It might be a somewhat suboptimal idea to learn from examples whose feature dimension and hidden dimesion happen to be identical, for green hands.

Please let me know if my suggested typos and suggestions are wrong due to my misunderstanding of the substance knowledge. Thank you.

dvgodoy commented 6 months ago

Hi @Patrick-Wen,

Thank you for pointing out these issues and for asking these questions, let me go through all of them:

  1. Inception class: you're right, it should be 2@HxW
  2. Hidden attribute: you're right, the text should read hidden, not hidden_state, to reflect the actual name of the attribute
  3. h_0 and h_1 in figures 8.6 and 8.7: you have a point, it may be somewhat confusing for the reader at first glance. I will consider modifying figure 8.7 to h_00 and h_01 in a future revision, perhaps it will make it more clear
  4. Dimensions of features and hidden layers: I see your point here and I considered doing exactly as you suggested (hidden dimension as 3) to better illustrate the shapes when I wrote the book. However, I wanted to depict the "Journey of a Hidden State" and making it 3-dimensional would make it impossible to draw it. I was torn between these two options and I ended up choosing to keep it 2-dimensional so I could plot it. Perhaps that wasn't the best choice, if it made difficult to properly understand the outputs. I appreciate your thoughts on this, it is very helpful to get the reader's perspective!

Hope it helps, and thank you for supporting my work :-) Best, Daniel

Patrick-Wen commented 6 months ago
  1. In Page 910, tokenizer(sentence1, sentence2) may be updated to joined_sentences = tokenizer(sentence1, sentence2) since joined_sentences was used in the immediate following code snippet;
  2. In Page 936, np.all(full_embeddings[alice_idx] == glove['alice']) needs to be updated to np.all(extended_embeddings[alice_idx] == glove['alice']) since full_embeddings is not defined anywhere.
  3. Page 937 put "Next, we use our glove_tokenizer() to tokenize the sentences, making sure that we pad and truncate them so they all end up with 60 tokens (like we did in the "HuggingFace’s Tokenizer" section)." But in the "HuggingFace’s Tokenizer" section, max_length was set to 50 rather than 60. I am not a native English speaker, just wondering if "like we did ..." is more appropriately phrased as "similar to what we did ...".
  4. In Page 937, train_labels = train_dataset['labels'] returned error " KeyError: "Column labels not in the dataset. Current columns in the dataset: ['sentence', 'source']" ". I am not sure whether the code should be updated to train_labels = train_dataset['source'] or the source needs to be converted to some categorical labels so that train_labels = train_dataset['labels'] could be implemented without any error.

The first two issues are restricted to code in the book, everything is fine with the full script in GitHub. Issue four exists both in the book and full script in GitHub.

dvgodoy commented 6 months ago

Hi @Patrick-Wen ,

Thank you for pointing those out.

Regarding item number 3, I was referring to the fact of "padding and truncating" because it is the same procedure even if we're using a different max length. I'm not an native speaker either, but the proofreader said it was fine so I kept it like that :-)

Regarding item number 4, I've just run Chapter 11's notebook on Colab and I did not get that error. For some reason, it looks like the following code (from page 894) was not executed on your end:

def is_alice_label(row):
    is_alice = int(row['source'] == 'alice28-1476.txt')
    return {'labels': is_alice}

dataset = dataset.map(is_alice_label)

The code creates the missing labels key in the dataset and it should make it work as intended. Hope it helps!

Best, Daniel

Patrick-Wen commented 6 months ago

Hi Daniel,

My fault for missing the is_alice_label function. Thank you for the information.

Best regards, Patrick