PerseusDL / lexica

Repo for the text files of lexica
Creative Commons Attribution Share Alike 4.0 International
52 stars 23 forks source link

[Lewis and Short] removing insignificant whitespace #73

Closed nkprasad12 closed 1 year ago

nkprasad12 commented 1 year ago

Hi @lcerrato

I'm working on a series of automation scripts to resolve some frequent tagging errors (for example, n. in verbs that are v. a. and n. sometimes being tagged as <gen> when in most entries it's in <pos>, abbreviations that are split into a different tag from their final ., etc...)

In order to make this task easier, I would like to send a PR doing the following:

  1. Fix all instances where an <entryFree> is split into multiple lines. Most of the time this is actually a mistake, and the second line of the <entryFree> is actually its own entry in LS. There are 2 occurrences where this is legitimate, and in these cases I want to move those all in the same line. In both cases, the amount of text in the second line is very small, and wouldn't cause the first line to being prohibitively long.
  2. Remove indentation before some entries. Since phase (1) makes it so that each entry is in one line, this should have no significant since it's outside the entries.
  3. Remove empty lines between entries. Again, this should have no significant since it's happening outside entry boundaries.

Would this be accepted?

lcerrato commented 1 year ago

@nkprasad12 Whitespace cleanup is helpful, thank you.

nkprasad12 commented 1 year ago

Sent https://github.com/PerseusDL/lexica/pull/75 for Phase 1.

nkprasad12 commented 1 year ago

Fixed by #75 and #76