GokuMohandas / Made-With-ML

Learn how to design, develop, deploy and iterate on production-grade ML applications.
https://madewithml.com
MIT License
37.52k stars 5.95k forks source link

Foundations --> CNN Doubts #206

Closed shashankvasisht closed 2 years ago

shashankvasisht commented 2 years ago

Hi, Thank you for such excellent lessons!!!

I had 3 doubts in the lecture, can you please explain them:

  1. When we pad the one-hot sequences to max number of seq length, why do we not put 1 at the 0th index? (so as to make it to correspond to < pad > token) Why is it currently all zeros ?

  2. When we're loading the weights in the interpretableCNN model, why dont we get the weight mis-match error ? (as we have dropped the FC layer part and we're also not using strict=False )

  3. My sns heatmap / conv_output have all the values 1 . It does not resemble yours...Can you help me with this?

image

GokuMohandas commented 2 years ago
  1. The token index is 0 in our code (see the Tokenizer class), unless it's configured otherwise. But you made me realize that we should pass in the index to pad_sequences instead of assuming it's 0. These kinds of mismatches lead to silent bugs! I'll push this change towards the end of this month.
  2. Good question, I'll add more details to the lesson to make this clear. But we're actually not dropping the FC layers. If you look at our __init__ function for InterpretableCNN, it has all the layers. The only difference is that we're returning an earlier artifact in the forward function.
  3. I've seen this happen if you don't train it to completion but also make sure that the PAD token index is zero. Until I solve the mismatch in pad_sequences function, we force PAD to be zeros.