This guidance creates a scalable environment in AWS to prepare genomic, clinical, mutation, expression and imaging data for large-scale analysis and perform interactive queries against a data lake. The solution also demonstrates the use of Amazon Omics for multi-modal analysis.
Sorry this took so long but this issue is fixed. We've also deployed a new version of the guidance with support for multimodal analysis in collaboration with BioTeam.
File: genomics-tertiary-analysis-and-data-lakes-using-aws-glue-and-amazon-athena/source/GenomicsAnalysisCode/resources/notebooks/runbook.ipynb
Environment: Current version of Sagemaker Jupyter and JupyterLab (as of June 2021)
Bug: Notebook fails to run due to package import error
Error:
ImportError: cannot import name 'as_pandas'
Fix: from pyathena.util import as_pandas -> change to -> from pyathena.pandas.util import as_pandas