Closed arademaker closed 2 years ago
the lines in
lixo.txt
that contain '@' were converted in theitems
to\s
. Is "\s" specific from TSDB?
Yes. https://github.com/delph-in/docs/wiki/ItsdbReference#formatting-conventions
But in my case, I want to add in the profiles the original ids of the sentences, so I assume I will need to convert the @ into \s myself before creating the profile, right?
Only if you use @
as the delimiter, because it assumes the file is formatted with TSDB escapes. So PyDelphin will first unescape the \s
in the file to @
, then when it writes the profile out again they get re-escaped to \s
.
because the only delimiter I can use to pass ids with the texts is @, right?
Nope. You can use other things, too, but only @
uses delphin.tsdb.split() for splitting instead of a regular string split.
$ cat input.txt
i-id i-input
10 Just @-mention them.
20 Or email oe@yy.com.
$ delphin mkprof --delimiter=" " -i input.txt -r relations tabprof
[...]
$ cat tabprof/item
10@@@@1@@Just \s-mention them.@@@@1@3@@@
20@@@@1@@Or email oe\syy.com.@@@@1@3@@@
$ cat input2.txt
i-id@i-input
10@Just \s-mention them.
20@Or email oe\syy.com.
$ delphin mkprof --delimiter="@" -i input2.txt -r relations tsdbprof
[...]
$ cat tsdbprof/item
10@@@@1@@Just \s-mention them.@@@@1@3@@@
20@@@@1@@Or email oe\syy.com.@@@@1@3@@@
In my input data, I do have a few cases of @ in the text, how to escape them? Running
the lines in
lixo.txt
that contain '@' were converted in theitems
to\s
. Is "\s" specific from TSDB?But in my case, I want to add in the profiles the original ids of the sentences, so I assume I will need to convert the
@
into\s
myself before creating the profile, right? because the only delimiter I can use to pass ids with the texts is@
, right?