How to separate the context sentences and the main citation sentence?

Hi,

yes, the sentences are saved without a seperator in the context.csv shipped with the data set.

In general, you can use the extract_contexts.py script that comes with the data set to create your own custom context.csv exports (and e.g. specify the window size in terms of the number of preceding and succeeding sentences or words).

To extract contexts with seperated sentences, you can make a small modification to line 161 of extract_contexts.py.
Replacing
return ' '.join(sentences)
with e.g.
return '<SEP>'.join(sentences)
or any other type of seperator token you want to specify should do the job.

As for sentences at the beginning or end of a document, window sizes are smaller because there is no preceding/succeeding sentence. E.g. for a window of <sentence><citing_sentence><setences> you would then get <citing_sentence><setences> or <sentence><citing_sentence> respectively.

IllDepence / unarXive

How to separate the context sentences and the main citation sentence? #8