Clarivate-LSPS / tMDataLoader

new Groovy-based tranSMART ETL
8 stars 19 forks source link

uploading in UPDATE_VARIABLES merge mode can be incredibly slow #64

Open kforner opened 7 years ago

kforner commented 7 years ago

Hello, I was experimenting with the MERGE modes, and I encountered surprising slowness.

Say I have 5 datasets for a same study. I first upload them in REPLACE (default mode), here are the timings, that appear normal:

# MERGE: REPLACE
# uploaded Antibody in 3.674 s
# uploaded Autoantibody in 4.094 s
# uploaded Luminex in 6.189 s
# uploaded HLA+Alleles in 10.789 s
# uploaded HLA+Indels in 17.399 s

In UPDATE_VARIABLES mode (starting with an empty DB):

# MERGE: UPDATE_VARIABLES
#uploaded Antibody in 3.944 s
#uploaded Autoantibody in 8.057 s
#uploaded Luminex in 4.976 s
#uploaded HLA+Alleles in 94.781 s
#uploaded HLA+Indels in 61.49 s

If you consider the HLA+Alleles, even if you take the sum of uploading the individual preceding datasets (~ 25s), it takes 4 times this time. And this is just a minimal setting to report my issue, but in practice I encountered cases where it literally took forever instead of say 2 minutes.

Is this expected behaviour, or is there something wrong in my setting or configuration ?

Thanks