Rostlab / LocText

Relation Extraction (RE) of: Proteins <--> Cell Compartments
https://www.tagtog.net/-corpora/LocText
Apache License 2.0
5 stars 2 forks source link

Count number of relations in relna/LocText corpus #24

Closed MadhukarSP closed 7 years ago

MadhukarSP commented 7 years ago

See statistics in file: https://github.com/juanmirocks/LocText/blob/feature/6_DS/tests/test_corpus_stats.py

Maximum achievable performance with normalizations:

if P = 1 and R = 0.8 (limit in D0 and D1):

(210.8)/(1+0.8) == 0.89


The function which performs the count is as follows:

count_true_relations_in_same_and_diff_sentences(dataset, rel_type)

The file path: nalaf/structures/count_of_relations_in_ss_and_ds.py

Import statement: from nalaf.structures.count_of_relations_in_ss_and_ds import count_true_relations_in_same_and_diff_sentences

data.py file has been modified: 1342 line has been commented to avoid assertion error, that occurs when there is an entity of type 'None'.

Branch in nalaf: feature/CountRelations

Current findings in relna:

***** True relation count is as follows **

D0 - relations in Same sentences -----------------------------------------> 304

D1 - relations in sentences which are 1 sentence apart. ------------------> 8

D2 - relations in sentences which are 2 sentence apart. ------------------> 1

D3 - relations in sentences which are 3 sentence apart. ------------------> 0

D4+ - relations in sentences which are 4 or more than 4 sentence apart ---> 1

END

juanmirocks commented 7 years ago

👍