We know how to extract lots of metadata from sources using source strategies, but most of this data gets thrown away. Instead it should be permanently stored in a table somewhere.
Specifically, for every source we should extract and store the following:
site name (Pixiv, Twitter, etc)
work id (Pixiv illust id, Twitter status id, etc)
artist id
artist username
tags (original Pixiv tags, Twitter hashtags, etc)
upload date (date the post was uploaded to Twitter / Pixiv / etc)
image url
page url
page number (for images part of a Pixiv / Twitter batch)
status (whether the source is active or deleted)
This would facilitate a number of things:
Searching posts by Twitter/Pixiv/etc tags.
Searching posts by Twitter/DeviantArt/etc IDs (#3924).
Sorting posts by the date they were originally posted on Twitter/Pixiv/etc (#3899).
Finding posts that are part of the same Pixiv or Twitter gallery.
Finding untagged posts by the same artist.
Simplifying artist lookups (find artists by artist id instead of trying to match profile urls).
We know how to extract lots of metadata from sources using source strategies, but most of this data gets thrown away. Instead it should be permanently stored in a table somewhere.
Specifically, for every source we should extract and store the following:
This would facilitate a number of things: