dglazkov / polymath

MIT License
133 stars 9 forks source link

Figure out text cleaning #99

Open dglazkov opened 1 year ago

dglazkov commented 1 year ago

While importing text, text can be gnarly. We need a consistent way to clean it.

There are two places where that happens currently:

  1. in chunker.py: https://github.com/dglazkov/polymath/blob/main/convert/chunker.py#L43
  2. In main.py: https://github.com/dglazkov/polymath/blob/main/convert/main.py#L58

Let's figure out a single way.