Minor didactic suggestion: Clarify the difference between different characters that represent the same letter

From a didactic perspective, I would suggest to shortly explain (or link to an explanation), how character sets work. More concretely, a lack of this knowledge might cause confusion in at least two places:

1) In FS_1_MVP_Data_Input_Homogenisation.ipynb in section 2.1, the OCR recognizes the long s ("ſ") as another character then "s". When creating the ground truth for the OCR, it could be made clear (maybe in section 2.1.1), that the decision between the two characters is an important modeling decesion that has consequences for all further processing steps as well as for the analysis.

2) Everytime a user enters a string to analyse the texts (e.g. in the word frequencies diagram in FS_1_MVP_Analysis_Prototype_101.ipynb), case sensitivity is important. It might not be clear to all users that "grippe" and "Grippe" are two different strings and therefore yield different results.

dh-network / quadriga

Minor didactic suggestion: Clarify the difference between different characters that represent the same letter #5