dbpedia / GSoC

Google Summer of Code organization
37 stars 27 forks source link

Tool to generate RDF from DBpedia abstracts (natural language text) #16

Closed MarianoRico closed 4 years ago

MarianoRico commented 5 years ago

Description

With the recent advances (e.g. SyntaxNet) in the analysis of texts in natural language, the conversion of texts into RDF triples is becoming a real possibility. This project will apply these ideas to a real use case: DBpedia. We will add the power of syntactic analyzers with the benefits of Name Entity identifiers (like Spotlight) to generate highly trustable RDF triples from the textual information (long abstract) about a given DBpedia resource.

Goals

The tool created will generate a new nt file with the triples proposed for all the DBpedia resources. This tools could be exploited by the DBpedia extraction process to provide a new nt file in the DBpedia downloads.

Impact

Increase the number of RDF triples for a given DBpedia resource.

Warm up tasks

Experience with SyntaxNet o any other NLP tool capable of providing a syntactic analyzer of natural language. Here we have to reach a balance between power and number of supported languages. Fluent RDF and DBpedia datasets (downloads).

Mentors

Mariano Rico

Keywords

NLP, text parsing, syntactic analysis, RDF generation

beyzayaman commented 5 years ago

Hi Mariano. I think this is a good idea but something is not clear for me. Is the impact only increasing the number of triples or is it also improving the tool? Because if it is the first case @chile12 and I developed a tool for Springer Nature abstracts (for this project: https://github.com/dbpedia/sci-graph-links) which can be also used for DBpedia abstracts as well. Besides, as an option I would suggest to integrate some other performance tools to this idea which might be quite interesting (e.g., https://eprints.weblyzard.com/112/2/Odoni_Kuntschik_Brasoveanu_Weichselbraun_semantics2018_orbis_cr.pdf )

MarianoRico commented 5 years ago

Well, we can use the tools you mention as "warm up" tasks. Do you have some paper describing the sci-graph-links tool?

beyzayaman commented 5 years ago

Here is the tool: https://github.com/beyzayaman/NER-assessment-for-springer-nature-abstracts But there is no further documentation about it except on the page. In case it is necessary I might try to prepare some.

sahitpj commented 5 years ago

Hi Marino, I am interested in taking up this project up. I have a good understanding in Natural Language processing including POS tagging and Entity extraction. I am familiar with the use and how SyntaxNet works. How can I proceed towards this project?

aditya-malte commented 5 years ago

Hello Mariano, I find this project very interesting (and in fact similar to a pet project that I am working on) I have experience with RDFs, SyntaxNet and other parsers. I also know the procedure of converting the parse tree to RDF triples. I would be delighted to work on this project. Regards, Aditya Malte (P.S. I've sent you a mail detailing my approach)

Adavideo commented 5 years ago

Hi Mariano. Very interesting project. I'm a student in a Master in Artificial Intelligence and I'm very interested in natural language processing and open source. I'll apply for sure.