In Project 3, for Task 1.1.2, the question as stated, "What stem in the dataset has the most words that are shortened to it?" would require excluding any words which are stems as a stem is not being shortened to itself. If the intent is to find how many words are associated to a stem, then the question would be better phrased as "What stem in the dataset has the most words that are associated to it?" Even though the two different logic implied by both leads to the same answer, that is more of a coincidence than a semantic difference that would always lead to the same result.
In Project 3, for Task 1.1.2, the question as stated, "What stem in the dataset has the most words that are shortened to it?" would require excluding any words which are stems as a stem is not being shortened to itself. If the intent is to find how many words are associated to a stem, then the question would be better phrased as "What stem in the dataset has the most words that are associated to it?" Even though the two different logic implied by both leads to the same answer, that is more of a coincidence than a semantic difference that would always lead to the same result.