cambridgeltl / MTL-Bioinformatics-2016

Creative Commons Attribution 4.0 International
216 stars 92 forks source link

Report for incorrect sentence split (JNLPBA-IOBES) #2

Open wonjininfo opened 5 years ago

wonjininfo commented 5 years ago

Hi, Thanks for providing these useful resources! While we were using the resources, we got to know that sentences in JNLPBA-IOBES dataset might be incorrectly split.

MTL-Bioinformatics-2016/data/JNLPBA-IOBES/test.tsv starts with

Number  O

of  O
glucocorticoid  B-protein
receptors   E-protein
in  O
lymphocytes S-cell_type
and O
their   O
sensitivity O
to  O
hormone O
action  O
.   O
The O

study   O
demonstrated    O

while MTL-Bioinformatics-2016/data/JNLPBA/test.tsv starts with

-DOCSTART-  O

Number  O
of  O
glucocorticoid  B-protein
receptors   I-protein
in  O
lymphocytes B-cell_type
and O
their   O
sensitivity O
to  O
hormone O
action  O
.   O

The O
study   O

We used our own post-preprocessing script to fix this and used the fixed dataset in our experiments.

Once again, thank you so much for sharing these useful resources!

GamalC commented 5 years ago

Hi @wonjininfo. Many thanks for this bit of information. I think others would appreciate having your script as well, would you mind sharing it? If you are willing you can create a pull request or send me the script (gkoc2 at cam dot ac uk) and I would add it.