Open gopa-noaa opened 1 month ago
It implies that the metadata is moved from METAR collection to COMMON collection and that METAR collection will only have type "DD" documents (the same for RAOB collection). This will require code changes to ingest, metadata scripts, and client. randy
On Wed, May 29, 2024 at 11:06 AM Gopa @.***> wrote:
No change to bucket, 3 scopes , development, integration, production, and 2 scopes under each, currently just METAR and COMMON.
— Reply to this email directly, view it on GitHub https://github.com/NOAA-GSL/VxIngest/issues/379, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDVQPSO6P5YDQA2D2J6CNDZEYDJBAVCNFSM6AAAAABIPLDB3CVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDGOBRGY4DQOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Randy Pierce
From a quick Google-ing a scope cannot be renamed after it is created. Have sent email to Couchbase ... Worst case, we can do the following:
A couple of other questions:
MD
(Metadata), DD
(Data Document), and JOB
/JOB-TEST
documents. Are there other document types that would make sense to put in their own collections?types
be replaced by using collections more? If they are useful, when does it make sense to have a collection vs a document type
field? E.g. - if the METAR collection solely contains type=DD
documents, I could see dropping the type
field unless there are reasons clients need to track that type. JOB-TEST
docs be renamed to JOB
and left in a "test" scope?To summarize the discussion from the dev meeting:
We decided we need to move this issue up and address how best to use collections, scopes, and buckets for our project & application.
We would like to come up with some use cases & whiteboard through how key parts of the application lifecycle would work with different data models. Ideally this would happen during the ingest meeting.
common
collection. The point was made that common
is pretty generic (like default
) and it could be better to have explicit & meaningful names to describe the data that collections hold so that we don't end up with a grab bag of data. However, we’re unsure of the performance tradeoffs of multiple collections.Couchbase Server 7 (released in 2021) introduced Scopes & Collections. Previously it was recommended to put all data in a “Bucket” and distinguish the documents with a type
field. It appears scopes are recommended for data isolation (prod/dev environments, introducing schema changes, etc…) and collections are intended as a replacement for the previously recommended “type” field.
This link explains Collections and Scope: https://docs.couchbase.com/server/current/learn/data/scopes-and-collections.html
Just noting down some salient points below:
A collection is a data container. Up to 1000 collections can be created per cluster. A collection can be indexed; and it can be dropped. The data in a collection can be replicated, by means of XDCR.
A scope is a mechanism for the grouping of multiple collections. Up to 1000 scopes can be created per cluster. A scope can be dropped. A scope cannot be indexed. The contents of a scope can be replicated, by means XDCR.
Benefits of Scopes and Collections The benefits of scopes and collections include:
The logical grouping of similar documents; potentially simplifying operations such as query, XDCR, and backup and restore.
The increased efficiency of indexing, due to the Data Service being able to provide documents from specific collections to the Index Service.
Simplified querying, since query statements are able to easily specify particular subsets of documents.
Easier migration from relational databases to Couchbase Server, since collections can be designed to correspond to pre-existing relational tables.
Secure isolation of different document-types, within a bucket; allowing applications to be specifically authorized to use only their appropriate subsets of data (see Access to Scopes and Collections, below).
This should help give us some guidance in organizing our document hierarchy. Lets plan to discuss further.
Thanks, Gopa! That makes it sound like it would be beneficial to explore using collections more.
TTL fields
XDCR
No change to bucket, 3 scopes , development, integration, production, and 3 collections under each, currently just METAR, RAOB, and COMMON.