Script comparing results from original and preprocessed document.
I didn't know where to put the script, so made it 01-03, but could probably be a part of 02-01.
We could also think about removing certain words, such as "odstavek" and "zakon". These are usually not a part of structural sentences, but hint at legal speak.
It sure must be 02_something. Maybe it can be 02_02 and we rename other notebooks.
I agree with you
Probably there is no important text in parenthesis but I am not sure about that. Maybe we can decide about that when we will use it in practice (predicting, document map) and test what works better.
Script comparing results from original and preprocessed document.