Closed furkanakkurt1335 closed 12 months ago
2 points I have right now for the script output:
ş
is Ģ
in the thesis 782470
. We can gather all the wrong decodings and make a dictionary out of them to use str.replace
if found in a PDF.started a dictionary by e7a529b.
@zeynepyirmibes had handled the above-mentioned dictionary with replacement_dict
in /normalize.py
.
extractor.py
had been used for yok-tez and dergipark. We were happy with the outputs of the script at the end.
After all the steps, we need to finalize the extractor script by evaluating it on several outputs before starting it on all the PDFs.