abaddon-moriarty / Arsene_Lupi-NER

Trying to create an NER specifically for location in the french books called "Arsène Lupin", and at some point overlap this with the Map of France.
0 stars 0 forks source link

Separate the preprocessing and main script #5

Open abaddon-moriarty opened 4 months ago

abaddon-moriarty commented 4 months ago

If I separate the preprocessing and the main script, we can simply run pre-processing before everything else, outputing the cleaned texts in a separate folder, then we won't have to do it everytime we want to use main.py.

abaddon-moriarty commented 2 months ago

Found what was causing empty texts, the beginning variable turns into -1 when the keyword is not found, so that would select the entire text. Instead of using if beginning: I used if beginning > 1: which seems to do the trick. Will be updated in the next commit