Should we assume one Tweet per document?

Another inconsistency between the two versions of TwitIE is that the main version puts the detected language in a feature on the Tweet annotation. This means that if we don't have a Tweet annotation the language gets lost (explains why the app for cloud adds the annotation if it doesn't exist). The English only app, however, puts the lang feature onto the document so that the conditional pipeline can use the feature to turn off future processing.

The outcome of this is that the main app can support processing multiple separate tweets inside a given GATE document, and they will be treated independently (at least for the purpose of lang ID). Where as the English only app treats the entire GATE document as a single tweet.

Should both apps behave in the same way? My feeling is that they should both assume one tweet per document, but I'm not sure how others would feel about that. If we do go with one tweet per document then we can do away with creating the Tweet annotation in the cloud app as it's no longer needed (although in this case the language would still get lost in the "test the pipeline" view as we don't show document features).

GateNLP / gateplugin-Twitter

Should we assume one Tweet per document? #5