galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.42k stars 1.01k forks source link

Do not load datatypes registry during every metadata collection #4609

Open jmchilton opened 7 years ago

jmchilton commented 7 years ago

If we serialized just a mapping of extensions to metadata classes as JSON we wouldn't need to load that XML for every request and we wouldn't need to load the registry for datatypes which brings in every datatype class and every datatype Python dependency into memory. We could really optimize metadata collection - especially for smaller files, uploads with extensions set, etc.... This was a significant slow down for instance when running tool tests when I measured this years ago and there has been an explosion of datatypes and datatype dependencies since then.

This would also setup subsequent optimizations such as serializing just the mappings we need and further optimizing Galaxy's imports so that we don't import so much when running metadata collection. Also if we could just do this without loading models - maybe just writing out a JSON file that could be reloaded - that would be even better down the road. We shouldn't need sqlalchemy loaded to do metadata collection on the cluster.

natefoo commented 7 years ago

This would be awesome, especially for the turnaround time of small jobs.

hexylena commented 7 years ago

Cc @bgruening. Yes please!