Closed fishiu closed 2 years ago
Hi,
yes, the sentences are saved without a seperator in the context.csv
shipped with the data set.
In general, you can use the extract_contexts.py script that comes with the data set to create your own custom context.csv
exports (and e.g. specify the window size in terms of the number of preceding and succeeding sentences or words).
To extract contexts with seperated sentences, you can make a small modification to line 161 of extract_contexts.py
.
Replacing
return ' '.join(sentences)
with e.g.
return '<SEP>'.join(sentences)
or any other type of seperator token you want to specify should do the job.
As for sentences at the beginning or end of a document, window sizes are smaller because there is no preceding/succeeding sentence. E.g. for a window of <sentence><citing_sentence><setences>
you would then get <citing_sentence><setences>
or <sentence><citing_sentence>
respectively.
Hi,
This is an issue about the structure of the context.csv:
It seemed that the context.csv put the context sentences and the main citation sentence together without any delimiters, but I want to do some experiments which need to separate and encode the sentences respectively.
By the way, do all the context string include three sentences? What if the main citation sentence is the first or last sentence?