allenai / peS2o

Pretraining Efficiently on S2ORC!
Apache License 2.0
136 stars 4 forks source link

What is the method of extracting the contents from S2ORC (especially citations) in peS2o? #5

Open realliyifei opened 10 months ago

realliyifei commented 10 months ago

By #1, the citations are removed. I am using peS2o but need the citations. So I need to re-process the contents, especially citations, from S2ORC, and then combine them with the current peS2o's contents. Could you guide me through the process?