Closed cynexit closed 8 years ago
Update: Split it into two collections:
objects
{
"md5":"098f6bcd4621d373cade4e832627b4f6",
"sha1":"a94a8fe5ccb19ba61c4c0873d391e987982fbbd3",
"sha256":"9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08"
}
submissions
{
"sha256":"9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08",
"user_id":1,
"source":"x",
"name":"test.exe",
"date":"2015-11-24T15:23:29Z"
}
and have multiple submissions for each object.
Do we want to add addition meta information such as the following?
object_reference - capture if it was a primary or secondary (dropped or carved) submission object_category - category the object would fall under object_type - MIME type [1]
[1] http://www.iana.org/assignments/media-types/media-types.xhtml
proposed as final for review
New proposal for object as follows: { "_id": UUID, "sha1: str, "sha256": str, "md5": str, "mime": str, "source": []str, "obj_name": []str, "submissions" []UUID, }
New proposal for submission as follows: { "_id": UUID "object": UUID, "user_id": str, "source": str, "date": ISO8601, "obj_name": str, "tags": []str, "comment": str, }
Note: This scheme will cause extra processing time on the writes. However, it will decrease the number of queries on reads. This is beneficial with Mongodb but probably not needed on say cassandra.
Final Scheme is as follows
Objects
{
"_id": UUID,
"sha1": str,
"sha256": str,
"md5": str,
"mime": str,
"source": []str,
"obj_name": []str,
"submissions": []UUID,
}
submission
{
"_id": UUID,
"object": UUID,
"user_id": str,
"source": str,
"date": ISO8601,
"obj_name": str,
"tags": []str,
"comment": str,
}
Proposed schema:
Where we should use sha256 as the shard key.