cellarium-ai / cellarium-cloud

Cellarium Cloud Core Library
BSD 3-Clause "New" or "Revised" License
5 stars 1 forks source link

Feat: #90 #92 #94 SQL template management #103

Closed fedorgrab closed 11 months ago

fedorgrab commented 1 year ago

Establishing a SQL Template generating library. This PR partially addresses issue #94. Additionally, it resolves issues #92, and #90 (which was previously closed due to the older implementation of filtering). However, not all SQL queries have been refactored; this task should be continued under #94.

Below is an example of the output log produced by an infrastructure test after applying these changes:

INFO     casp.cell_data_manager.sql.query:query.py:46 Rendered SQL Query:
create or replace table `dsp-cell-annotation-service.cas_test_dataset.test_extract_homo_sap__extract_cell_info` partition by range_bucket(extract_bin, generate_array(0, 40000, 10)) cluster by extract_bin as
select cas_cell_index,
       cas_ingest_id,
       cell_type,
       total_mrna_umis,
       donor_id,
       assay,
       development_stage,
       disease,
       organism,
       sex,
       tissue,
       dataset_filename,
       cast(floor((row_number() over () - 1) / 10000) as int) as extract_bin
from `dsp-cell-annotation-service.cas_test_dataset.test_extract_homo_sap__extract_cell_info_randomized` c

This demonstrates how casp/cell_data_manager/sql.templates/prepare_curriculum/prepare_cell_info.sql.mako was rendered during the extraction phase.