allenai / qasper-led-baseline

Apache License 2.0
50 stars 9 forks source link

Filter NLP papers in S2ORC #23

Open xanhho opened 1 year ago

xanhho commented 1 year ago

Hi, thank you for your dataset and paper,

In your paper, you said that you only used 'computational linguistics domain' in the S2ORC dataset?

As I know, S2ORC only has 'Computer Science' field. And there are many different names for the same venue (https://github.com/allenai/s2orc/issues/26).

Could you share your experience, how do you obtain NLP papers?

I tried to use the 'journal' field in Metadata and filter by some defined keywords (EMNLP, ACL, ...) but it seems that I may miss some others.

Another way is to list all conference names and then manually annotate them but it seems that doing this takes much time and it requires knowledge about the type of the conference.

Thank you for your support.

pdasigi commented 1 year ago

@kyleclo extracted CL papers from S2ORC. Kyle, can you please answer this question?