EBISPOT / eqtl-sumstats-service

eQTL Summary Statistics Service
0 stars 0 forks source link

Model the eQTL data in MongoDB for efficient loading and retrieval #4

Open karatugo opened 1 month ago

karatugo commented 1 month ago

We need to design and implement an efficient data model for storing eQTL (expression Quantitative Trait Loci) data in MongoDB. The goal is to ensure that the data can be loaded quickly and retrieved efficiently to support API queries and downstream analyses. The data model should handle large datasets, optimize for query performance, and accommodate the specific structure of eQTL data.

karatugo commented 1 month ago

Had a couple of meetings with the DBA team. They had a few options based on the API usage patterns. So we asked Kaur if he's aware of such patterns.

karatugo commented 1 month ago

Next action item is to choose one of the database design options based on Kaur's feedback.

karatugo commented 4 weeks ago

In our meetings with the DBA team, we identified key API usage patterns. Additionally, thanks to Kaur's feedback, my analysis of old Kubernetes logs supports these patterns:

API Usage:

  1. Type 1: Search by study.
  2. Type 2: Search by specific field (e.g., gene ID, rs ID, variant) within a study.
  3. Type 3: Search by specific field across all studies.

Karthick suggested a few approaches for database design, and we decided to proceed with the second approach:

Approach 2: Individual Collections per Study

We acknowledge the challenges of handling Type 3 searches. In the future, we may consider an asynchronous approach (e.g., process the request and notify users via email upon completion).

karatugo commented 2 weeks ago

MongoDB model implemented as discussed above.