In Lecture 6, section Reducing Policy Gradient Variance, in the sentence
OBS: If we don't want to fit something that takes both states and actions we can just fit $V^{\pi}$ at the cost of using a single-sample estimate for $s_{t+1}$.
We will do this for now, to fit $Q^\pi$ look into Q-learning methods.
the link to Q-learning is missing. Did you mean to link the lecture 7 or lecture 8, or some external references?
In Lecture 6, section Reducing Policy Gradient Variance, in the sentence