-
This issue is to discuss the design for handling of TEI input for the source and translation text.
At the most basic level, we need to be able to extract the word tokens from the TEI XML in order t…
-
Hi, there is still a problem with autoindent ans autoclosing tag. That happens when autoclosing tag is inside a word, which may happen for example in XML-TEI.
So the output I get
```
Something…
-
Dear all,
I would like to duscuss how we are going to encode the presence of māhdar. So far I could find three cases of its description, all of them EMIP under collation. Would it be the right place?
-
I cannot begin to tell you how excited I am to have found an OSS implementation of CBGM, and on top of that one that is/promises to be compatible with the data & local/global decisions collected by IN…
-
This is /dev/null (The 2nd meaning of https://en.wiktionary.org/wiki//dev/null) where requests go. We never promise to write the parser you want. I will help you if you want to write a parser only if …
-
Proposal for the addition of data to the repo:
- under `softcite-dataset/tei/`, all the TEI files corresponding to the PDF of the dataset, as converted by Grobid
- under `softcite-dataset/json/`, al…
-
Each corpus should have an optional description detailing licencing and other corpus related information.
- The information should be read from a ~~`corpus.md`~~ `corpus.xml` file in the root of th…
-
Hi, I found a small regression with the latest version of grobid, for what concern the results of the segmentation model with the new pdfalto version (0.5).
First of all, I just realised that the …
-
We planned to include the identifiers for articles our annotators coded but did not find any software mentions in. They are in the CSV dataset, but they don't seem to be in the XML file?
@kermit2 …
-
I have an originally error-free file:
```xml
dummy
dummy
dummy
dummy
```
When I tried to replace it with `xml.fileAssociations`:
```
"x…