Apart from the structural differences between tweets and user profiles (a single text blog + attachments for tweets, multiple fields with different meanings/purposes for user profiles), URLs, hashtags and mentions in profiles have different interpretations and meanings than the same thing in tweets. The current schema conflates these, making every query that needs to touch these elements more complicated to avoid mistakes.
Additionally, this schema means that we can't use foreign key constraints to maintain referential integrity.
As an example of this for hashtags, I propose we change the current schema:
create table hashtag (
source_id text, -- the id of the object (user or tweet) this hashtag is included in
source_type text, -- "user" or "tweet"
field text not null, -- e.g. "description", "text" - which field of the source
-- object the hashtag is in
tag text not null
)
To the following two tables:
create table if not exists tweet_hashtag (
id integer foreign key references tweet(id),
hashtag text,
-- Normalisation, to allow indexing on common transformations of hashtags to match Twitter platform affordances
-- Twitter makes no distinction between #SuperBowl, #SuperbOwl and #SUPERBOWL
hashtag_lower text,
primary key (id, hashtag)
);
create table if not exists user_profile_hashtag (
id integer foreign key references tweet(id),
field text,
hashtag text,
hashtag_lower text,
primary key (id, field, hashtag)
)
Apart from the structural differences between tweets and user profiles (a single text blog + attachments for tweets, multiple fields with different meanings/purposes for user profiles), URLs, hashtags and mentions in profiles have different interpretations and meanings than the same thing in tweets. The current schema conflates these, making every query that needs to touch these elements more complicated to avoid mistakes.
Additionally, this schema means that we can't use foreign key constraints to maintain referential integrity.
As an example of this for hashtags, I propose we change the current schema:
To the following two tables: