My current plan here is to have a jupyter notebook that will automate loading and starting a seqrepo rest server inside a hail dataproc cluster and configuring vrs-python to use it, and will enable doing live normalizations of simple alleles from a hail table. The vrs-python calls on the hail workers will communicate with the seqrepo rest service on the dataproc master node. Depending on the specifics of what alleles are in the hail table and how big the table is, doing this through hail's parallel computation of expression fields might run into synchronization related exceptions in the seqrepo rest service.
My current plan here is to have a jupyter notebook that will automate loading and starting a seqrepo rest server inside a hail dataproc cluster and configuring vrs-python to use it, and will enable doing live normalizations of simple alleles from a hail table. The vrs-python calls on the hail workers will communicate with the seqrepo rest service on the dataproc master node. Depending on the specifics of what alleles are in the hail table and how big the table is, doing this through hail's parallel computation of expression fields might run into synchronization related exceptions in the seqrepo rest service.
https://github.com/biocommons/seqrepo-rest-service/issues/12