Closed hanktopia closed 5 years ago
This is odd, because it's not actually to do with text length - if you take the first sentence and copy and paste it loads of times, the demo can still handle it. I wonder what's happening for that particular paragraph....
You're right, I assumed too much. If you remove the sentence starting with "Former players", it works more like you'd expect. So, it's not a length problem but doesn't appear to be working correctly either. I don't have other definitive tests, but I'm seeing more missed entities than before overall.
Describe the bug Calling the predictor using input longer than a couple sentences returns a tag of "O" for every word in the input. If, using the same sample, the first few sentences only are fed into the predictor then the tags identify entities as expected.
To Reproduce The behavior can be seen on the demo page: https://demo.allennlp.org/named-entity-recognition
Copying this into the sentence field and hitting run shows no found named entities: The Jayhawks' first coach was the inventor of the game of basketball, James Naismith. Naismith, ironically, is the only coach in Kansas basketball history with a losing record. The Kansas basketball program has produced many notable professional players, including Clyde Lovellette, Wilt Chamberlain, Jo Jo White, Danny Manning, Raef LaFrentz, Paul Pierce, Nick Collison, Kirk Hinrich, Mario Chalmers, Andrew Wiggins and Joel Embiid. Politician Bob Dole also played basketball at Kansas.[2] Former players that have gone on to be coaches include Phog Allen, Adolph Rupp, Dean Smith, Dutch Lonborg, and former assistants to go on to be notable coaches include John Calipari, Gregg Popovich, and Bill Self. Mark Turgeon, Jerod Haase, and Danny Manning are all former players and assistant coaches that became head coaches. Allen founded the National Association of Basketball Coaches and, with Lonborg, was an early proponent of the NCAA tournament.[3][4] Four different Jayhawk head coaches are in the Naismith Memorial Basketball Hall of Fame as coaches, Phog Allen, Larry Brown, Roy Williams, and current head coach Bill Self.
Copying this shortened version of the same text into sentence and hitting run shows named entities as espected: The Jayhawks' first coach was the inventor of the game of basketball, James Naismith. Naismith, ironically, is the only coach in Kansas basketball history with a losing record. The Kansas basketball program has produced many notable professional players, including Clyde Lovellette, Wilt Chamberlain, Jo Jo White, Danny Manning, Raef LaFrentz, Paul Pierce, Nick Collison, Kirk Hinrich, Mario Chalmers, Andrew Wiggins and Joel Embiid. Politician Bob Dole also played basketball at Kansas
I see the same behavior in my application, system details below
Expected behavior Large blocks of text can be input to the NER predictor and the entities should be found. It shouldn't fail silently.
System (please complete the following information): Application running in a docker container on Ubuntu: From Dockerfile
python3 --version Python 3.6.8
Python package versions:
Allen NLP Initialization:
Additional context I'm actively developing and my Dockerfile doesn't specify version numbers, so I'm pulling in and running new releases as they become available. Longer blocks of text worked a couple days ago, but stopped working yesterday.