RNAcentral / rnacentral-webcode

RNAcentral website source code
https://rnacentral.org
Apache License 2.0
31 stars 8 forks source link

Speed up/fix reliability of search export #522

Closed blakesweeney closed 3 years ago

blakesweeney commented 3 years ago

Search export has issues with speed (takes about a day) and reliability (crashes often). Many of these issues are fixable. As part of this release we should fix, or at least limit the scope, these issues.

The speed can be improved by changing how we extract data from the database. Currently there is one large query that gets most of the data. This can be replaced with several smaller queries and the joins can be effectively done outside the database. For example, by indexing the files after export. This should be much faster than a single large query.

Reliability may be fixable by moving to Kubernetes instead of LSF. Also, some of the issues with the cluster can be limited with better configuration of the pipeline (nextflow containerOptions/changing --bind in singularity). Those options should be used as well.