NCATSTranslator / Knowledge_Graph_Exchange_Registry

The Biomedical Data Translator Consortium site for development of Knowledge Graph Exchange Standards and Registry
MIT License
5 stars 3 forks source link

Dynamic EBS volume provisioning may eventually be needed for post-processing of graphs(?) #62

Open RichardBruskiewich opened 3 years ago

RichardBruskiewich commented 3 years ago

The "new" (Sept 2021) strategy for creating the tar.gz archive files of KGE file sets uses a Linux CLI script (kgea_archive.bash) that caches KGX nodes and edges TSV files on the hard drive, just prior to running the tarprogram (generating a tar version of the archives). This points to the requirement for a suitably large hard disk drive to accommodate the caching.

A possible (perhaps necessary) KGEA system enhancement is to somehow dynamically allocate a "large enough" temporary EBS volume for the operation (this is akin to provisioning temporary compute EC2's, but just storage...).

The alternative is to somehow either provision a large enough disk right up front (probably costly and wasteful for most downloads) or to limit archive sizes of the uploaded files (again, not too satisfactory).