fraserh / waldi

0 stars 0 forks source link

Decide on database software and design initial schemas #5

Open matthewpalmer opened 9 years ago

matthewpalmer commented 9 years ago

Thinking mysql or postgres, but I don't really mind.

In terms of design, possibly something like (probably forgetting stuff here)

Item

Category

Matches ← will be one of the most active tables

Store

Notes

fraserh commented 9 years ago

Another value for Item might have to be num_of_units (per purchase) to deal with woolworths annoying "2 for 1" specials (I would like to ignore them, but realistically that's where we'll get a lot of savings)

matthewpalmer commented 9 years ago

Item

Category

Matches ← will be one of the most active tables

Store

With indexes like

Item store, title, matchesWith, belongsTo

Category title

matches item_one_id, item_two_id

store name

matthewpalmer commented 9 years ago

Also might be a problem: I think django has a built-in ORM for mysql. We can either map our data importer to whatever the django ORM generates, or map django to our schema. I slightly prefer the latter, because it'll be more loosely coupled and means we can write our bulk update/inserter regardless of the application layer. It might mean there are a couple of extra hoops in app development though. Thoughts?

matthewpalmer commented 9 years ago

This is untested, but hopefully provides a more concrete schema

This is tested. Haven't actually added any data yet, but the db and tables are set up.

--
-- Create tables
--

-- Store
CREATE TABLE `store` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `name` ENUM('coles', 'woolworths') NOT NULL,
  `base_url` VARCHAR(500),
  `created_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
);

-- Category
CREATE TABLE `category` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `store` INT UNSIGNED NOT NULL,
  `title` VARCHAR(200) NOT NULL,
  `number_of_items` INT UNSIGNED NOT NULL,
  `created_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  FOREIGN KEY(`store`) REFERENCES `store`(`id`)
);

-- Match
CREATE TABLE `match`(
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `item_one` INT UNSIGNED,
  `item_two` INT UNSIGNED,
  `rating` FLOAT,
  `created_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
);

-- Item
CREATE TABLE `item` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `store` INT UNSIGNED NOT NULL,
  `title` VARCHAR(500) NOT NULL,
  `price_per_unit` FLOAT,
  `price_per_kg` FLOAT,
  `unit_size` INT UNSIGNED,
  `units_per_perchase` INT UNSIGNED,
  `url` VARCHAR(500),
  `category` INT UNSIGNED,
  `created_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  FOREIGN KEY(`store`) REFERENCES `store`(`id`),
  FOREIGN KEY(`category`) REFERENCES `category`(`id`)
);

-- 
-- Create indexes
--
CREATE INDEX item_in_store
ON`item`(`store`);

CREATE INDEX item_with_title
ON `item`(`title`);

CREATE INDEX item_in_category
ON `item`(`category`);

CREATE UNIQUE INDEX category_with_title
ON `category`(`title`);

CREATE INDEX category_in_store
ON `category`(`store`);

CREATE INDEX match_item_one
ON `match`(`item_one`);

CREATE INDEX match_item_two
ON `match`(`item_two`);

CREATE UNIQUE INDEX match_both_unique
ON `match`(`item_one`, `item_two`);

CREATE UNIQUE INDEX store_with_name
ON `store`(`name`);
fraserh commented 9 years ago

I agree with the latter, and that looks good.