Store original tweet as fetchedTweet,retweeted_status when fetching a RT (no extra call needed as you have the data already and probably the original tweet's profile).
This means a network can be done with the original and the retweets related.
Or even treat them all as the same message when graphing messages, since the RT has the same underlying message and can be grouped with the original.
(note that quoting a tweet is different and introduces text before the original appears)
Then mark the retweeted tweet as linked by tweet ID e.g. retweet_original_id. This becomes an effective boolean for RT or not with being null, rather than checking for text starting.
This could be done with a GUID, but since the original is there, just insert it and then insert the retweet and then relate them. All future retweets relating to that should attempt to relate to the existing original in the db before attempting to store a new one.
Another advantage of keeping the retweets relating to the original is that the RT will be cut to 140 characters even in extended mode (for backwards compatibilty, Twitter does this even for originals at 280 characters), which means text is missing for language analysise.
Store original tweet as fetchedTweet,retweeted_status when fetching a RT (no extra call needed as you have the data already and probably the original tweet's profile). This means a network can be done with the original and the retweets related. Or even treat them all as the same message when graphing messages, since the RT has the same underlying message and can be grouped with the original. (note that quoting a tweet is different and introduces text before the original appears)
Then mark the retweeted tweet as linked by tweet ID e.g.
retweet_original_id
. This becomes an effective boolean for RT or not with being null, rather than checking for text starting. This could be done with a GUID, but since the original is there, just insert it and then insert the retweet and then relate them. All future retweets relating to that should attempt to relate to the existing original in the db before attempting to store a new one.Another advantage of keeping the retweets relating to the original is that the RT will be cut to 140 characters even in extended mode (for backwards compatibilty, Twitter does this even for originals at 280 characters), which means text is missing for language analysise.