PerseusDL / treebank_data

Perseus Treebank Data
70 stars 45 forks source link

v.2.1 Missing @cite values in urn:cts:greekLit:tlg0012.tlg001.perseus-grc1.tb #27

Open Eumaeus opened 4 years ago

Eumaeus commented 4 years ago

The @cite attribute is empty when the token is a mark of punctuation. If punctuation is part of the Edition, it belongs to citable passages as much as any word-token.

francescomambrini commented 4 years ago

Assigning @cite attribute to punctuation would also help reconstructing the text from the treebank. So definitely +1 to that!

gcelano commented 4 years ago

I do not remember I have introduced @cite, but rather kept them if present. @balmas, was @cite assigned in Arethusa?

balmas commented 4 years ago

I don't believe this was done by Perseids or Arethusa. At least I can't find any reference to it in the code.

gcelano commented 4 years ago

In any case, yes, cts should also be applied to punctuation marks. This is what I am doing:

https://git.informatik.uni-leipzig.de/celano/latinnlp/-/tree/master/temporary

Is there any new reference style for cite? It can be inferred from, for example:

https://git.informatik.uni-leipzig.de/celano/latinnlp/-/blob/master/temporary/phi0588.abo005.perseus-lat2/phi0588.abo005.perseus-lat2.tok01.xml

but this can be transformed into something more easily readable.