delph-in / pydelphin

Python libraries for DELPH-IN
https://pydelphin.readthedocs.io/
MIT License
79 stars 27 forks source link

how to escape '@' in the mkprof #338

Closed arademaker closed 2 years ago

arademaker commented 2 years ago

In my input data, I do have a few cases of @ in the text, how to escape them? Running

% delphin mkprof -v -i lixo.txt -r ~/hpsg/logon/lingo/lkb/src/tsdb/skeletons/english/Relations  --skeleton lixo

the lines in lixo.txt that contain '@' were converted in the items to \s. Is "\s" specific from TSDB?

But in my case, I want to add in the profiles the original ids of the sentences, so I assume I will need to convert the @ into \s myself before creating the profile, right? because the only delimiter I can use to pass ids with the texts is @, right?

delphin mkprof -v -i examples.1 --delimiter "@" --relations ~/hpsg/logon/lingo/lkb/src/tsdb/skeletons/english/Relations --skeleton examples
goodmami commented 2 years ago

the lines in lixo.txt that contain '@' were converted in the items to \s. Is "\s" specific from TSDB?

Yes. https://github.com/delph-in/docs/wiki/ItsdbReference#formatting-conventions

But in my case, I want to add in the profiles the original ids of the sentences, so I assume I will need to convert the @ into \s myself before creating the profile, right?

Only if you use @ as the delimiter, because it assumes the file is formatted with TSDB escapes. So PyDelphin will first unescape the \s in the file to @, then when it writes the profile out again they get re-escaped to \s.

because the only delimiter I can use to pass ids with the texts is @, right?

Nope. You can use other things, too, but only @ uses delphin.tsdb.split() for splitting instead of a regular string split.

$ cat input.txt
i-id    i-input
10  Just @-mention them.
20  Or email oe@yy.com.
$ delphin mkprof --delimiter="  " -i input.txt -r relations tabprof
[...]
$ cat tabprof/item
10@@@@1@@Just \s-mention them.@@@@1@3@@@
20@@@@1@@Or email oe\syy.com.@@@@1@3@@@
$ cat input2.txt
i-id@i-input
10@Just \s-mention them.
20@Or email oe\syy.com.
$ delphin mkprof --delimiter="@" -i input2.txt -r relations tsdbprof
[...]
$ cat tsdbprof/item
10@@@@1@@Just \s-mention them.@@@@1@3@@@
20@@@@1@@Or email oe\syy.com.@@@@1@3@@@