dice-group / gerbil

GERBIL - General Entity annotatoR Benchmark
GNU Affero General Public License v3.0
224 stars 58 forks source link

[dataset] Microposts2014 wrapper #47

Closed RicardoUsbeck closed 10 years ago

RicardoUsbeck commented 10 years ago

Write a wrapper for the Microposts2014 dataset. Annotate the license, experiment type and language. Give provenance. Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets

giusepperizzo commented 10 years ago

Wrapper implemented. Experiments can be performed both on Training and Test set. Please check before closing the issue.

Beware that the expected dataset looks like: tweet_id \t tweet_text \t list(pair)

where: pair=entity_mention \t dbpedia_uri each pair is separated by another by \t

Example: 91649478326624256 "When it hurts to look back, and you're afraid to look ahead, you can look beside you and a #Leo will be there - #LeoFriendship" Leo http://dbpedia.org/resource/Leo_(astrology)

Hence, you can first get the GS [1] and per each tweet_id you have to download the corresponding text.

[1] - http://www.scc.lancs.ac.uk/microposts2014/challenge/dataset/microposts2014-neel_challenge_gs.zip