ivan-kocienski-gfsc commented 1 year ago

Description

"Seeding" is the act of populating an empty database with sample records so local developers have something to work with on their local machines.

But there are many considerations to take into account that have impacts on later work. Like- the whole problem should be looked at from a high level.

Put together a document that defines the scope of seeding. Considering various approaches and their pros/cons. Have a discussion or agreement about the options available and the solution selected.

Expected output

A document explaining the choices available, the option selected, and its utility and limitations.

Acceptance Criteria

[x] Report (see below)
[x] #1947

ivan-kocienski-gfsc commented 1 year ago

Developer onboarding notes

How to seed the database from empty.

The ultimate problem here is that if we allow for external services to be called then we will have no control over their responses (unless we run our own local copies) or we can stub the external calls but we must be vigelant to keep our stubs in step with how the services actually work.

Okay. so the challenge of PC is that so much of the service is dependent on external parties.

The only things we have that don't touch the outside world are

sites
tags
users (kind of with email and OAuth)

But external factors come in with

neighbourhoods (ONS postcode data)
partners
- with address (postcode lookup)
- with service areas (depends on neighbourhoods)
calendars (for pulling feeds)
events (depend on all above)

Potential solutions

1. import from production snapshot.

Pros:

replicating problems from production will be easy
has full data available
get to stress test UX locally
no need to consider external services as everything is set up.

Cons:

not exactly beginner friendly
need to download production
different snapshots will have different data
complicated relations may be hard to get up to speed with
security- production data is on local machines

2. Pure artificial fake data

fake data where we populate all the tables with hard-coded values explicitly setting up scenarios bypass the need to call out on any service. it's just poked in by hand. (think its like a test factory).

Pros:

very small (fast)
easily replicatable
could be used as a basis for isolation testing
completely isolated from external services

Cons:

hard to replicate problems arising on production
does it represent any meaningful scenarios?
will need to be maintained with advances in deployed code

3. Limited subset of production data

fake data where we build a subset of live production data.

Pros:

Better reflects production state
Excersize more of the code
Still isolated from external services

Cons:

(same as the pure artificial)
Need to carefully select production data to replicate

Choice

For the purpose of seeding the database task I will pick option 3 as it will isolate us from external services but will still capture something of production. I would also say that option 1 should be an established procedure as there are many times when checking against live data is the only way to replicate problems.

Next steps

draw up the data that needs to be present post-run.
select information that can be imported (from production)
put together code to safely and cleanly import data in a replicable way

katjam commented 1 year ago

Closing as report complete.

geeksforsocialchange / PlaceCal

Define seeding parameters #1946

Description

Expected output

Acceptance Criteria

Developer onboarding notes

Potential solutions

1. import from production snapshot.

2. Pure artificial fake data

3. Limited subset of production data

Choice

Next steps