Closed keien closed 10 years ago
Here's a similar phenomenon where single-word sequences are missing (this one is from the tweets dataset so I can't cross-check with an old SQL dump):
>>> d
<Sentence: Vid: Rep. Debbie Wasserman Schultz [#FL20]: Discussing the GOP's Pledge for America on CNN.wmv http://bit.ly/dc8u6b #tcot #p2>
>>> for se in d.sequences: print se
...
<Sequence Vid>
<Sequence Vid :>
<Sequence Vid : Rep.>
<Sequence Vid Rep.>
<Sequence Vid : Rep. Debbie>
<Sequence Vid Rep. Debbie>
<Sequence : Rep.>
<Sequence : Rep. Debbie>
<Sequence : Rep. Debbie Wasserman>
<Sequence Wasserman Schultz -LSB- #FL>
<Sequence Wasserman Schultz -lsb- #FL>
<Sequence Schultz -LSB- #FL>
<Sequence Schultz -lsb- #FL>
<Sequence Schultz -LSB- #FL 20>
<Sequence Schultz -lsb- #FL 20>
<Sequence -LSB- #FL>
<Sequence -lsb- #FL>
<Sequence -LSB- #FL 20>
<Sequence -lsb- #FL 20>
<Sequence -LSB- #FL 20 -RSB->
<Sequence -lsb- #FL 20 -rsb->
<Sequence #FL>
<Sequence #FL 20>
<Sequence #FL 20 -RSB->
<Sequence #FL 20 -rsb->
<Sequence #FL 20 -RSB- :>
<Sequence #FL 20 -rsb- :>
<Sequence 20 -RSB->
<Sequence 20 -rsb->
<Sequence 20 -RSB- :>
<Sequence 20 -rsb- :>
<Sequence 20 -RSB- : Discussing>
<Sequence 20 -RSB- Discussing>
<Sequence 20 -rsb- : discuss>
<Sequence 20 -rsb- discuss>
<Sequence -RSB- :>
<Sequence -rsb- :>
<Sequence -RSB- : Discussing>
<Sequence -RSB- Discussing>
<Sequence -rsb- : discuss>
<Sequence -rsb- discuss>
<Sequence -RSB- : Discussing the>
<Sequence -rsb- : discuss the>
<Sequence : Discussing>
<Sequence : discuss>
<Sequence : Discussing the>
<Sequence : discuss the>
<Sequence : Discussing the GOP>
<Sequence : discuss the GOP>
<Sequence Discussing>
<Sequence discuss>
<Sequence Discussing the>
<Sequence discuss the>
<Sequence Discussing the GOP>
<Sequence Discussing GOP>
<Sequence discuss the GOP>
<Sequence discuss GOP>
<Sequence Discussing the GOP 's>
<Sequence discuss the GOP 's>
<Sequence the GOP>
<Sequence the GOP 's>
<Sequence the GOP 's Pledge>
<Sequence GOP 's>
<Sequence GOP 's Pledge>
<Sequence GOP Pledge>
<Sequence GOP 's Pledge for>
<Sequence 's Pledge>
<Sequence 's Pledge for>
<Sequence 's Pledge for America>
<Sequence Pledge>
<Sequence Pledge for>
<Sequence Pledge for America>
<Sequence Pledge America>
<Sequence Pledge for America on>
<Sequence for America>
<Sequence for America on>
<Sequence for America on CNN.wmv>
<Sequence America on>
<Sequence America on CNN.wmv>
<Sequence America CNN.wmv>
<Sequence America on CNN.wmv http:\/\/bit.ly\/dc8u6b>
<Sequence America CNN.wmv http:\/\/bit.ly\/dc8u6b>
<Sequence on CNN.wmv>
<Sequence on CNN.wmv http:\/\/bit.ly\/dc8u6b>
<Sequence on CNN.wmv http:\/\/bit.ly\/dc8u6b #tcot>
<Sequence CNN.wmv>
<Sequence CNN.wmv http:\/\/bit.ly\/dc8u6b>
<Sequence CNN.wmv http:\/\/bit.ly\/dc8u6b #tcot>
<Sequence CNN.wmv http:\/\/bit.ly\/dc8u6b #tcot #p>
<Sequence http:\/\/bit.ly\/dc8u6b>
<Sequence http:\/\/bit.ly\/dc8u6b #tcot>
<Sequence http:\/\/bit.ly\/dc8u6b #tcot #p>
<Sequence http:\/\/bit.ly\/dc8u6b #tcot #p 2>
<Sequence #tcot #p>
<Sequence #tcot #p 2>
Is this supposed to happen?
Do we ever get single word sequences?
I think we're supposed to, no? I thought that for every word in the sentence, there should be a sequence for it and the following three words.
We are, I'm curious if we never get one-word sequences or if we don't only under certain circumstances.
We do get one-word sequences, but they don't seem to generate for all the sentences that should have them.
I think I found the culprit - this line is supposed to be outside the if
block. I'm rerunning personals
to see if it solves the problem.
Was that it?
yep, looks like sequences are perfect now according to the accuracy checks
Sentence: I would love to do it again.
Sequences missing in new db:
I have no idea why this happens.