Badsender-com / LePatron.email

LePatron is an opensource email builder allowing to industrialize your email template production. Build tailor made email templates and make them available to your non-technical users.
https://www.lepatron.email
GNU General Public License v3.0
73 stars 12 forks source link

Optimization of Tags for Emails #816

Closed omar-bear closed 2 months ago

omar-bear commented 3 months ago

Optimization of Tags for Emails

Problem

The loading of emails is slow because it requires fetching tags for each email in a workspace. This problem is particularly pronounced when the number of emails exceeds 15,000.

Proposed Solution

Approach

  1. Create a separate collection for tags: This collection will contain each tag with fields id, label, and companyId.
  2. Reference tags in emails: Emails will contain references (IDs) to the documents in the tags collection.
  3. Optimize queries using joins (lookups): Use joins to efficiently fetch emails corresponding to specified tags.

Collection Structure

Tag Collection Schema

const mongoose = require('mongoose');
const { Schema } = mongoose;
const { ObjectId } = Schema.Types;

const TagSchema = new Schema({
  label: { type: String, required: true },
  companyId: { type: ObjectId, ref: 'Company', required: true },
}, { timestamps: true });

const Tag = mongoose.model('Tag', TagSchema);

module.exports = Tag;

Email Collection Schema

const mongoose = require('mongoose');
const { Schema } = mongoose;
const { ObjectId } = Schema.Types;

const MailingSchema = new Schema({
  name: { type: String, required: true },
  _user: { type: ObjectId, ref: 'User', alias: 'userId' },
  author: { type: String, set: normalizeString, alias: 'userName' },
  _workspace: { type: ObjectId, ref: 'Workspace', alias: 'workspace' },
  _parentFolder: { type: ObjectId, ref: 'Folder' },
  _wireframe: { type: ObjectId, ref: 'Template', alias: 'templateId' },
  wireframe: { type: String, set: normalizeString, alias: 'templateName' },
  _company: { type: ObjectId, ref: 'Group', alias: 'group' },
  tags: [{ type: ObjectId, ref: 'Tag' }], // References to tags
  data: {},
  espIds: [String],
}, { timestamps: true });

const Mailing = mongoose.model('Mailing', MailingSchema);

module.exports = Mailing;

Query Optimization Using Aggregation and Joins

  1. Reference tags in emails: Emails will contain references (IDs) to the documents in the tags collection.
  2. Optimize queries using joins (lookups): Use joins to fetch emails corresponding to specified tags efficiently.

Update Tag Operations

Migration Script

Since we have more than 30,000 emails and more than 5,000 tags, we need to be careful with the migration script to update all the old data with the new method.

  1. Extract Unique Tags:

    • Use aggregation to identify all unique tags and their associated companyId.
  2. Insert Unique Tags into Tags Collection:

    • Ensure each unique tag is present in the tags collection.
    • Maintain a mapping of tag labels and companyId to tag _id.
  3. Update Mailings with Tag References:

    • Process mailings in manageable batches.
    • Replace tag labels with their corresponding tag _id using the mapping.
  4. Ensure Data Integrity:

    • run the script with mongosh
    • Validate updates after each batch to ensure correctness.

By following these instructions and carefully planning each step, you can ensure a smooth and efficient migration process while maintaining data integrity and minimizing downtime.

Advantages of This Approach

  1. Reduction in Number of Documents Scanned:

    • By using joins to match tags, the number of documents to scan when fetching emails is reduced.
    • This approach can be faster as it limits the filtering operations to a much smaller set of documents.
  2. Optimized Queries:

    • Queries are optimized as they do not need to scan all email documents to apply tag filters.
    • Join operations (lookups) can be minimized, which can improve performance.
  3. Better Scalability:

    • This approach is more scalable as it moves the filtering work to the tags collection, which is smaller and easier to manage.
    • Tag management becomes more centralized and consistent.

Conclusion

By reorganizing the data structure and using aggregation and joins techniques, you can significantly improve the performance of tag filters and avoid the slowdown caused by scanning each email individually.

sonarcloud[bot] commented 2 months ago

Quality Gate Failed Quality Gate failed

Failed conditions
7.5% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud