MetaCell / geppetto-NeuroSCAN

Yale NeuroSCAN & Promoter DB Project
MIT License
3 stars 0 forks source link

Evaluate observed strapi import plugin issues #142

Open lrebscher opened 3 years ago

lrebscher commented 3 years ago

@afonsobspinto please extend by what you could already find out

afonsobspinto commented 3 years ago

The biggest concern is the time it takes: Some time measurements for the biggest csv file synapses_300 (10688 rows): Analyzing the file: ~15 minutes Importing the content: ~15 minutes Importing relations: ~30 minutes

Regarding the analysis of the file, I didn't touch that part. It is still coming from the original plugin. For the importing of data, I tried to improve the times a little by having some kind of caching behavior and agglomerate all promises together so that they can run in parallel (with Promise.all) but it didn't help much. This is what I do For each line of the csv I look up the (relation) related entity id (or id's if it is an array) and check if it is already in a cache map/dictionary. If it is, I add the strapi object (from the cache map value) to the final dictionary; If it isn't I query strapi to get the object given the id and I add it to both cache and final dictionaries. When finishing parsing the csv, for each entry in the final dictionary I call strapi update endpoint

Bold is what I think is causing it to take so long. (Too many queries to strapi database)

Ideally doing all these calls in one single transaction would be ideal but I couldn't find any kind of bulk update on strapi