explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.92k stars 4.39k forks source link

Request for Semantic Role Labeling .... #170

Closed profversaggi closed 6 years ago

profversaggi commented 8 years ago

I'm VERY impressed with the speed and accuracy of the NER functionality and an only using SRL elsewhere because it doesn't exist her. May I formally request it's inclusion in the next major release?

honnibal commented 8 years ago

You can certainly request it :)

We definitely want to do SRL. At the moment the following tasks are higher priority:

The good news is that velocity is currently pretty good. The bad news is that it's still hard to hand over these tasks to others, so things are mostly happening in serial.

Unfortunately I can't really give you an estimate for when SRL might be done.

profversaggi commented 8 years ago

No worries! The SpaCy framework is pretty awesome as it is so we'll use what we can and patiently wait in the Q of tasks to be implemented. Keep up the good work!

On Wed, Nov 11, 2015 at 12:42 PM, Matthew Honnibal <notifications@github.com

wrote:

You can certainly request it :)

We definitely want to do SRL. At the moment the following tasks are higher priority:

  • Improved CI framework, running on our own test server, as Travis CI doesn't give us enough memory to test with the models
  • Better NER, particularly using large phrase dictionaries acquired from Wikipedia
  • Multi-lingual support
  • Better data parallelism, using Spark, and multi-threading

The good news is that velocity is currently pretty good. The bad news is that it's still hard to hand over these tasks to others, so things are mostly happening in serial.

Unfortunately I can't really give you an estimate for when SRL might be done.

— Reply to this email directly or view it on GitHub https://github.com/honnibal/spaCy/issues/170#issuecomment-155858150.

######################################################### Matthew R. Versaggi, Artificial Intelligence Engineer, Imagine One, LTD President & CEO: Versaggi Information Systems, Inc. Adjunct Professor of eBusiness DePaul University Email: mailto:matt@versaggi.com, ProfVersaggi@gmail.com M: 630-292-8422 LinkedIn: http://www.linkedin.com/in/versaggi About Me: http://www.matt-versaggi.com/resume/ #########################################################

silentrob commented 8 years ago

+1

scottlingran commented 8 years ago

Referencing #60 , original comment:

Well, the good news is there's lots of good stuff coming. The bad news is
it's pushed SRL down a bit.

- Knowledge-based NER
- Multi-lingual
- Stablise 1.0 API
- Domain adaptation
- Theano integration, neural network models
- SRL

The better news is SRL isn't so much work, given recent research. If you
can put in a weekend or two we could probably get this done:
http://alt.qcri.org/semeval2014/cdrom/pdf/SemEval034.pdf

The idea is to learn the SRL as a projective tree, by giving up on some of
the relations.

What we need:
- Survey the papers implementing similar tree approximations
- Pick the best one
- Implement the data transform

If you can do that initial spadework, I'd be happy to run the experiments.
I can supply sample data for the transformation.

@honnibal I might give this a shot, would you still recommend the tree approximation approach?

profversaggi commented 8 years ago

Awesome! Some great things you guys got going on there ! :-)

On Wed, Apr 13, 2016 at 3:46 PM, Scott Li notifications@github.com wrote:

Referencing #60 https://github.com/spacy-io/spaCy/issues/60 , original comment:

Well, the good news is there's lots of good stuff coming. The bad news is it's pushed SRL down a bit.

  • Knowledge-based NER
  • Multi-lingual
  • Stablise 1.0 API
  • Domain adaptation
  • Theano integration, neural network models
  • SRL

The better news is SRL isn't so much work, given recent research. If you can put in a weekend or two we could probably get this done:http://alt.qcri.org/semeval2014/cdrom/pdf/SemEval034.pdf

The idea is to learn the SRL as a projective tree, by giving up on some of the relations.

What we need:

  • Survey the papers implementing similar tree approximations
  • Pick the best one
  • Implement the data transform

If you can do that initial spadework, I'd be happy to run the experiments. I can supply sample data for the transformation.

@honnibal https://github.com/honnibal I might give this a shot, would you still recommend the tree approximation approach?

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/spacy-io/spaCy/issues/170#issuecomment-209618923

######################################################### Matthew R. Versaggi, Artificial Intelligence Engineer, Imagine One, LTD President & CEO: Versaggi Information Systems, Inc. Adjunct Professor of eBusiness DePaul University Email: mailto:matt@versaggi.com, ProfVersaggi@gmail.com M: 630-292-8422 LinkedIn: http://www.linkedin.com/in/versaggi About Me: http://www.matt-versaggi.com/resume/ #########################################################

honnibal commented 8 years ago

I'd still recommend the tree approximation approach, yes. We'd be excited to have you working on this functionality, so @wbwseeker and I will be happy to support you. The main complication is, do you have access to the SRL data? We're not licensed to distribute this to you. We could work around it by putting up a quick API for you to train the model, and giving you some test data to develop with.

Data issues aside, I would suggest the following strategy:

  1. Work on getting the data transformation implemented, in whatever hacky, once-off-script sort of way you want;
  2. Once the data is transformed, run the parsing experiments, with both spaCy and another dependency parser. I would suggest MATE is a good idea, because it's a strong performing system that also comes with SRL results. The goal would be to check that we're getting performance in the right ball-park, and that we're not missing any exceptionally low hanging fruit, by e.g. adding a few features etc.
  3. Integration into spaCy. This is the part that feels like 20% of the work, but will surely take 80% of the effort. There's already good precedent for a transform/untransform procedure around the model training, implemented by @wbwseeker for German parsing (which requires 'non-projective' trees).

The big question is that the SRL really wants a different API. How should these predicate-argument structures be consumed? And how can we make it easy to move between the SRL annotation and the other annotations spaCy provides?

Probably I would suggest lettng the SRL functionality live as a separate module for a while. We could release this on PyPi, and let the API evolve. This way you can just write whatever you need for the moment, and not worry about the One True Solution. When it's evolved and stabilised we can integrate it back into the main library.

scottlingran commented 8 years ago

Are you referring to the CoNLL 2009 data?

It seems the CoNLL 2012 data is available for download. Would this be appropriate?

honnibal commented 8 years ago

Doesn't that require OntoNotes? OntoNotes isn't available for download.

matthewramirez89 commented 8 years ago

@scottyli Looks fascinating. Did you end up building this out?

fmfn commented 7 years ago

Any progress on this front? I would be interested in helping if needed.

ines commented 6 years ago

Quick update: This might be a nice use case for the new custom processing pipeline components and extension attributes introduced in v2.0!

ines commented 6 years ago

Merging this with the newer #2336!

lock[bot] commented 6 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.