Optimization of Tags for Emails

Problem

The loading of emails is slow because it requires fetching tags for each email in a workspace. This problem is particularly pronounced when the number of emails exceeds 15,000.

Proposed Solution

Approach

Create a separate collection for tags: This collection will contain each tag with fields id, label, and companyId.
Reference tags in emails: Emails will contain references (IDs) to the documents in the tags collection.
Optimize queries using joins (lookups): Use joins to efficiently fetch emails corresponding to specified tags.

Collection Structure

Tag Collection Schema

const mongoose = require('mongoose');
const { Schema } = mongoose;
const { ObjectId } = Schema.Types;

const TagSchema = new Schema({
  label: { type: String, required: true },
  companyId: { type: ObjectId, ref: 'Company', required: true },
}, { timestamps: true });

const Tag = mongoose.model('Tag', TagSchema);

module.exports = Tag;

Email Collection Schema

const mongoose = require('mongoose');
const { Schema } = mongoose;
const { ObjectId } = Schema.Types;

const MailingSchema = new Schema({
  name: { type: String, required: true },
  _user: { type: ObjectId, ref: 'User', alias: 'userId' },
  author: { type: String, set: normalizeString, alias: 'userName' },
  _workspace: { type: ObjectId, ref: 'Workspace', alias: 'workspace' },
  _parentFolder: { type: ObjectId, ref: 'Folder' },
  _wireframe: { type: ObjectId, ref: 'Template', alias: 'templateId' },
  wireframe: { type: String, set: normalizeString, alias: 'templateName' },
  _company: { type: ObjectId, ref: 'Group', alias: 'group' },
  tags: [{ type: ObjectId, ref: 'Tag' }], // References to tags
  data: {},
  espIds: [String],
}, { timestamps: true });

const Mailing = mongoose.model('Mailing', MailingSchema);

module.exports = Mailing;

Query Optimization Using Aggregation and Joins

Reference tags in emails: Emails will contain references (IDs) to the documents in the tags collection.
Optimize queries using joins (lookups): Use joins to fetch emails corresponding to specified tags efficiently.

Update Tag Operations

Add Tags: Update the way tags are added during the creation of emails to reference the tags collection.
Delete Tags: Ensure that when a tag is deleted, references in the emails collection are updated accordingly.
Update Tags: When updating a tag, make sure all references in the emails collection are consistent.

Migration Script

Since we have more than 30,000 emails and more than 5,000 tags, we need to be careful with the migration script to update all the old data with the new method.

Extract Unique Tags:
- Use aggregation to identify all unique tags and their associated companyId.
Insert Unique Tags into Tags Collection:
- Ensure each unique tag is present in the tags collection.
- Maintain a mapping of tag labels and companyId to tag _id.
Update Mailings with Tag References:
- Process mailings in manageable batches.
- Replace tag labels with their corresponding tag _id using the mapping.
Ensure Data Integrity:
- run the script with mongosh
- Validate updates after each batch to ensure correctness.

By following these instructions and carefully planning each step, you can ensure a smooth and efficient migration process while maintaining data integrity and minimizing downtime.

Advantages of This Approach

Reduction in Number of Documents Scanned:
- By using joins to match tags, the number of documents to scan when fetching emails is reduced.
- This approach can be faster as it limits the filtering operations to a much smaller set of documents.
Optimized Queries:
- Queries are optimized as they do not need to scan all email documents to apply tag filters.
- Join operations (lookups) can be minimized, which can improve performance.
Better Scalability:
- This approach is more scalable as it moves the filtering work to the tags collection, which is smaller and easier to manage.
- Tag management becomes more centralized and consistent.

Conclusion

By reorganizing the data structure and using aggregation and joins techniques, you can significantly improve the performance of tag filters and avoid the slowdown caused by scanning each email individually.

Badsender-com / LePatron.email