chanzuckerberg / single-cell-data-portal

The data portal supporting the submission, exploration, and management of projects and datasets to cellxgene.
MIT License
61 stars 12 forks source link

Investigate performance of groupby operation for GE expression #6146

Closed prathapsridharan closed 8 months ago

prathapsridharan commented 10 months ago

Priority: P1 Feature: Apply Group By, 1 tissue expanded (blood, group by = disease) Load Time: 42s Desired Load Time: 5s Minimally Acceptable Load Time: 10s

prathapsridharan commented 10 months ago

@atarashansky has reported new load times in the as of 11/03/2023 in the following doc: https://docs.google.com/document/d/1w-XUG8AmioSYKekWrA1cSjyoGsAPKzE5c6V4C5gCm_U/edit#heading=h.8a1xw6x4n17t

Reproduction steps

Timing

prathapsridharan commented 10 months ago

@atarashansky @joyceyan @kaloster @dsadgat

Suggestions To Improve TileDB Query Speeds

NOTE: Please view these as only suggestions. They are based on what were observed from the tests below and it is very likely more tests are needed to check the validity of these suggestions. As it pertains to TileDB, it is worth doing more profiling tests just for TileDB and perhaps asking for help from the TileDB team to understand its inner workings

  1. Copy TileDB data files to the web server local disk for faster query results. This is perhaps the strongest suggestion for the short-to-medium term

The tests suggest that the queries on TileDB may have bottleneck in transporting large amounts of data from AWS S3 to the webserver making the query. Even if TileDB fetches exactly the final result set from S3, traversing an object store like S3 is quite likely to be slower than a linux filesystem. What seems likely is that TileDB might fetch a larger block of data from S3 and then do further filtering on the web server that made the query.

Other Suggestions (Somewhat weaker and maybe even hand wavy)

  1. Allow WMG V2 to accept a list of tissues to filter by so that the frontend can send better constrained queries
  2. The datastructures and processing of the datastructures on the frontend could be structured such that query criteria to the backend only fetch a minimum of data. There is ample evidence that frontend is fetching too much data, too often, and inducing a high workload for itself (deserializing 35MB+ of json data) and the backend
  3. The backend API could be improved such that it allows for querying smaller slices of data. For this we would need to better study what the user wants to "view" at a particular point in time and map it to the tightest query criteria to serve that view.
  4. Better understand why we chose an embedded database model over a database server model and investigate whether one model is significantly better/worse for our use case. It is possible TileDB may offer a database server solution.

Embedded databases (other examples: sqlite, levelDB, indexDB, etc) are typically used for niche use cases like mobile applications, web browsers, and to shard high volume writes (and perhaps even reads). Typical web applications use a database server where the data and the database server are on the same machine. Actions like moving the data to the data processor or sharding the data would rarely be considered in database server model unless the workloads are truly large.

Analysis of Testing in STAGING, RDEV, and LOCAL MACHINE

Staging Testing

Test - Profile query time for expression_summary_cube

curl -X 'POST' \
  'https://api.cellxgene.staging.single-cell.czi.technology/wmg/v2/query' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"compare":"disease","filter":{"dataset_ids":[],"development_stage_ontology_term_ids":[],"disease_ontology_term_ids":[],"gene_ontology_term_ids":["ENSG00000188596","ENSG00000255794","ENSG00000130413","ENSG00000118997","ENSG00000165309","ENSG00000115423","ENSG00000197748","ENSG00000105877","ENSG00000186952","ENSG00000114670","ENSG00000186409","ENSG00000155761","ENSG00000152582","ENSG00000170959","ENSG00000157423","ENSG00000168038","ENSG00000174776","ENSG00000107249","ENSG00000253438","ENSG00000206530","ENSG00000133640","ENSG00000080298","ENSG00000162814","ENSG00000250305","ENSG00000156042","ENSG00000164692","ENSG00000168542","ENSG00000011465","ENSG00000108821","ENSG00000106809","ENSG00000139329","ENSG00000129009","ENSG00000113140","ENSG00000105894","ENSG00000248527","ENSG00000114115","ENSG00000100097","ENSG00000152661","ENSG00000234741","ENSG00000245910","ENSG00000087303","ENSG00000237550","ENSG00000175061","ENSG00000116774","ENSG00000189058","ENSG00000038427","ENSG00000198856","ENSG00000185222","ENSG00000203875","ENSG00000118418"],"organism_ontology_term_id":"NCBITaxon:9606","self_reported_ethnicity_ontology_term_ids":[],"sex_ontology_term_ids":[],"publication_citations":[]},"is_rollup":true}'

Observations

  1. The first few trials take about 24s to 32s as reported by datadog where _query for expressionsummary takes between 12s to 19s and rollup takes between 6s to 8s
  2. In the later trials the total runtime drops to 16s to 19s and _expressionsummary query drops to 4s to 6s
  3. There was no change in the runtime of rollup which consistently took 6s to 8s. Although rollup is not the focus of the investigation, the fact that the runtime for this computation doesn't change indicates that it is compute bound operation and it makes up a smaller proportion of the overall runtime of the /query endpoint in the average case

Hypothesis

  1. The significant speed-up in _query for expressionsummary in the later trials is due to local disk caching

The lower bound observed for the _query time of expressionsummary is around 4s and the upper bound is around 19s. The lower bound seems too high for the caching to be entirely in memory and therefore likely due to local disk caching. The upper bound of 19s seems to indicate that this is primarily due to network I/O and data (de)serialization to/from AWS S3

RDEV Testing

Test - Compare query times for expression_summary_cube and expression_summary_default_cube

{"filter":{"dataset_ids":[],"development_stage_ontology_term_ids":[],"disease_ontology_term_ids":[],"gene_ontology_term_ids":["ENSG00000188596","ENSG00000255794","ENSG00000130413","ENSG00000118997","ENSG00000165309","ENSG00000115423","ENSG00000197748","ENSG00000105877","ENSG00000186952","ENSG00000114670","ENSG00000186409","ENSG00000155761","ENSG00000152582","ENSG00000170959","ENSG00000157423","ENSG00000168038","ENSG00000174776","ENSG00000107249","ENSG00000253438","ENSG00000206530","ENSG00000133640","ENSG00000080298","ENSG00000162814","ENSG00000250305","ENSG00000156042","ENSG00000164692","ENSG00000168542","ENSG00000011465","ENSG00000108821","ENSG00000106809","ENSG00000139329","ENSG00000129009","ENSG00000113140","ENSG00000105894","ENSG00000248527","ENSG00000114115","ENSG00000100097","ENSG00000152661","ENSG00000234741","ENSG00000245910","ENSG00000087303","ENSG00000237550","ENSG00000175061","ENSG00000116774","ENSG00000189058","ENSG00000038427","ENSG00000198856","ENSG00000185222","ENSG00000203875","ENSG00000118418"],"organism_ontology_term_id":"NCBITaxon:9606","self_reported_ethnicity_ontology_term_ids":[],"sex_ontology_term_ids":[],"publication_citations":[]},"is_rollup":true}

Observations

  1. _query time for the criteria on expression_summarycube: 14s to 18s and a result set size of 400,000 rows
  2. _query time for the criteria on expression_summary_defaultcube: 6s to 7s and a result set size of 77,000 rows
  3. _query time for criteria on expression_summary_cube takes 2-2.5X more than on expression_summary_defaultcube
  4. The result set produced by querying expression_summary_cube is more than 5X that of the result set of produced by querying expression_summary_default_cube

Hypothesis

  1. The same criteria query takes 2-2.5X more time on expression_summary_cube because there are vastly more rows that match the criteria in expression_summary_cube, which requires transporting large chunks of data from AWS S3 back to the web server. Since the only dimensions used in both cubes for querying are the indexed dimensions, a bulk of the time can be attributed to the payload size that needs to transported over the network and (de)serialized.

LOCAL MACHINE Testing (Mac M1 chip)

Test - Compute average query time expression_summary_cube when data is resident on local disk

v2_query_50_genes_compare_disease = '{"compare":"disease","filter":{"dataset_ids":[],"development_stage_ontology_term_ids":[],"disease_ontology_term_ids":[],"gene_ontology_term_ids":["ENSG00000188596","ENSG00000255794","ENSG00000130413","ENSG00000118997","ENSG00000165309","ENSG00000115423","ENSG00000197748","ENSG00000105877","ENSG00000186952","ENSG00000114670","ENSG00000186409","ENSG00000155761","ENSG00000152582","ENSG00000170959","ENSG00000157423","ENSG00000168038","ENSG00000174776","ENSG00000107249","ENSG00000253438","ENSG00000206530","ENSG00000133640","ENSG00000080298","ENSG00000162814","ENSG00000250305","ENSG00000156042","ENSG00000164692","ENSG00000168542","ENSG00000011465","ENSG00000108821","ENSG00000106809","ENSG00000139329","ENSG00000129009","ENSG00000113140","ENSG00000105894","ENSG00000248527","ENSG00000114115","ENSG00000100097","ENSG00000152661","ENSG00000234741","ENSG00000245910","ENSG00000087303","ENSG00000237550","ENSG00000175061","ENSG00000116774","ENSG00000189058","ENSG00000038427","ENSG00000198856","ENSG00000185222","ENSG00000203875","ENSG00000118418"],"organism_ontology_term_id":"NCBITaxon:9606","self_reported_ethnicity_ontology_term_ids":[],"sex_ontology_term_ids":[],"publication_citations":[]},"is_rollup":true}'
total_query_time = 0
n = 200
for _ in range(n):
    start = time.perf_counter()
    df = q.expression_summary(criteria, compare_dimension=compare)
    end = time.perf_counter()
    total_query_time += (end - start)

print(f"total: {total_query_time}, avg: {total_query_time/n}")

Observations

  1. query time for expression_summary_cube when the TileDB data is on local disk is 50-100X faster than runtimes seen on STAGING
  2. No difference in query times between including the compare dimension and not including the compare dimension. This makes sense because there is no additional filtering happening. The compare dimension is merely appended as a column to the result which doesn't change the number of rows in the result.

Hypothesis

  1. Having the TileDB array on local disk significantly speeds up queries

Even accounting for the fact that my local M1 might be significantly more powerful than the web server, CPU and memory advantages alone cannot fully account for a speed up of 2 orders of magnitude. Network I/O and (de)serialization are not bottlenecks on the local machine whereas those are likely to be bottlenecks on our AWS infra.

Test - Compare average query time for expression_summary_cube and expression_summary_default_cube when data is resident on local disk

v2_query_50_genes_no_compare = '{"filter":{"dataset_ids":[],"development_stage_ontology_term_ids":[],"disease_ontology_term_ids":[],"gene_ontology_term_ids":["ENSG00000188596","ENSG00000255794","ENSG00000130413","ENSG00000118997","ENSG00000165309","ENSG00000115423","ENSG00000197748","ENSG00000105877","ENSG00000186952","ENSG00000114670","ENSG00000186409","ENSG00000155761","ENSG00000152582","ENSG00000170959","ENSG00000157423","ENSG00000168038","ENSG00000174776","ENSG00000107249","ENSG00000253438","ENSG00000206530","ENSG00000133640","ENSG00000080298","ENSG00000162814","ENSG00000250305","ENSG00000156042","ENSG00000164692","ENSG00000168542","ENSG00000011465","ENSG00000108821","ENSG00000106809","ENSG00000139329","ENSG00000129009","ENSG00000113140","ENSG00000105894","ENSG00000248527","ENSG00000114115","ENSG00000100097","ENSG00000152661","ENSG00000234741","ENSG00000245910","ENSG00000087303","ENSG00000237550","ENSG00000175061","ENSG00000116774","ENSG00000189058","ENSG00000038427","ENSG00000198856","ENSG00000185222","ENSG00000203875","ENSG00000118418"],"organism_ontology_term_id":"NCBITaxon:9606","self_reported_ethnicity_ontology_term_ids":[],"sex_ontology_term_ids":[],"publication_citations":[]},"is_rollup":true}'
total_query_time = 0
n = 200
for _ in range(n):
    start = time.perf_counter()
    df = q.expression_summary_default(criteria)
    end = time.perf_counter()
    total_query_time += (end - start)

print(f"total: {total_query_time}, avg: {total_query_time/n}")

Observations

  1. query time for expression_summary_default_cube when the TileDB data is on local disk is 30-60X faster than runtimes seen on STAGING and RDEV
  2. query time for expression_summary_default_cube is 2X faster than query time for expression_summary_cube for the same criteria. This is similar query time ratio seen in the RDEV test.

Hypothesis

  1. Having the TileDB array on local disk significantly speeds up queries for the same reason given in the expression_summary_cube query test above

  2. Similar to the RDEV test, the same query criteria executes twice as fast when querying expression_summary_default_cube than querying expression_summary_cube because significantly fewer rows are scanned in expression_summary_default_cube

prathapsridharan commented 10 months ago

As a follow up to the LOCAL MACHINE TESTING, I ran 200 trials of querying expression_summary_cube by generating a query of 50 random genes in each trial with grouping by disease. The average runtime i still about the same, ~0.20 seconds. I did randomized the query by genes to mitigate effects of any optimizations due to making the same query in rapid succession:

import random
import time

unique_genes = list(expr_default_cube.gene_ontology_term_id.unique())

def get_random_query_criteria():
    random_genes = random.sample(unique_genes, k=50)

    query = {
        "compare":"disease",
        "filter":{
            "dataset_ids":[],
            "development_stage_ontology_term_ids":[],
            "disease_ontology_term_ids":[],
            "gene_ontology_term_ids": random_genes,
            "organism_ontology_term_id":"NCBITaxon:9606",
            "self_reported_ethnicity_ontology_term_ids":[],
            "sex_ontology_term_ids":[],
            "publication_citations":[]
        },
        "is_rollup":True}

    is_rollup = query.get("is_rollup", True)
    compare = query.get("compare", None)

    if compare:
        compare = find_dimension_id_from_compare(compare)

    criteria = WmgQueryCriteriaV2(**req["filter"])
    return criteria

def run_trial_query(query_func, n=200):
    total_query_time = 0

    for _ in range(n):
        criteria = get_random_query_criteria()
        start = time.perf_counter()
        df = query_func(criteria)
        end = time.perf_counter()
        total_query_time += (end - start)

    return total_query_time, total_query_time/n

total, avg = run_trial_query(q.expression_summary, n=200)
print(f"total_time: {total} seconds, avg: {avg} seconds")
total_time: 39.24072681547841 seconds, avg: 0.19620363407739205 seconds
joyceyan commented 10 months ago

Another suggestion I just thought of is that it could be better to not send off any backend requests until a user has selected all the filters they want to select, and then click some "Submit" button. Right now, my understanding is that touching any of the filters will send off a new request to WMG. So if you want to select 5 different genes and then also group by disease, we'd send off six requests to WMG (one for each of the 5 genes, and another for the group by). If you can pick the filters and then click a "Submit" button before sending off a request to WMG, we'd only have to send one request to WMG. This type of product change obviously wouldn't result in any of our endpoints being faster, but would help the problem of users spending a lot of time waiting for things to load.

prathapsridharan commented 10 months ago

@joyceyan -

I think collecting the criteria first and hitting some type of submit button is a good idea to consider. It is a product decision in terms of how the application feels when you do that but assuming that is acceptable, something like that does reduce the total number of requests to the backend.

In the current case of picking genes though, if you pick 5 genes, it cancels the previous request-in-flight (when possible) as you click each gene. So in most cases, picking 5 genes yields 1 request to the backend but is that is not always the case since a delay between selection could have sent the previous request off and it would be too late to cancel. Moreover, there is scope for a race condition bug here: A request with 4 genes gets sent to backend, then adding 1 more gene sends a request with 5 genes, but the request with 5 genes gets back faster than the 4 gene request. The 4 gene request then clobbers the 5 gene request thus creating a buggy experience.

joyceyan commented 10 months ago

Oh I see - so it's not as bad as I thought, but sending more requests than we need (even if they get cancelled) is still strictly worse.

prathapsridharan commented 10 months ago

@atarashansky had a commit to copy the tiledb data files to the local fs but reverted it the very next day

I think it might be worth revisiting this.

prathapsridharan commented 10 months ago

In my rdev I performed a test of selecting 50 genes - adding to dotplot all marker genes for stem cell and ciliated epithelial cell and group-by disease with cProfile. The tileDB runtime is 14.77 s.

It seems _98% of the time is spent on run_query() method of multirange_indexing.py of tileDB when querying expression_summary_cube_:

2999 function calls (2978 primitive calls) in 14.585 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   14.585   14.585 tracer.py:952(func_wrapper)
        1    0.000    0.000   14.585   14.585 query.py:87(expression_summary)
        1    0.004    0.004   14.585   14.585 query.py:121(_query)
        1    0.003    0.003   14.578   14.578 _decorators.py:323(wrapper)
        1    0.000    0.000   14.576   14.576 concat.py:146(concat)
        1    0.000    0.000   14.570   14.570 concat.py:389(__init__)
        2    0.000    0.000   14.570    7.285 multirange_indexing.py:288(__iter__)
        1   14.537   14.537   14.570   14.570 multirange_indexing.py:393(_run_query)
        1    0.000    0.000    0.033    0.033 {method 'to_pandas' of 'pyarrow.lib._PandasConvertible' objects}
        1    0.000    0.000    0.032    0.032 pandas_compat.py:797(table_to_blockmanager)
        1    0.000    0.000    0.031    0.031 pandas_compat.py:1163(_table_to_blocks)
        1    0.031    0.031    0.031    0.031 {pyarrow.lib.table_to_blocks}
        1    0.000    0.000    0.006    0.006 concat.py:565(get_result)
        1    0.000    0.000    0.006    0.006 concat.py:176(concatenate_managers)
        3    0.006    0.002    0.006    0.002 {method 'copy' of 'numpy.ndarray' objects}

And _run_query() is the central function in DataFrameIndexer class to implement the .df[] indexing to directly return a dataframe. See here: https://github.com/TileDB-Inc/TileDB-Py/blob/24d5b3bb0869ef4d2f0d0c64c494bed037754949/tiledb/multirange_indexing.py#L394

After performing a tiledb query, we convert the result to a dataframe using .df[], and so _it is this conversion to a dataframe that seems to be the problem rather than query speed of TileDB or network I/O as originally thought. Moreover, this is true even for querying expression_summary_default_cube AND expression_summary_cube and for other use cases like using the secondary filter to filter the dotplot by indexed or non-indexed dimensions of the cube_

prathapsridharan commented 10 months ago

TileDB stats collection done on RDEV

Here is an example of using tiledb.stats_enable() to profile querying the expression_summary cube with 50 genes and group-by disease:

{'counters': {'Context.StorageManager.Query.Reader.attr_fixed_num': 78,
              'Context.StorageManager.Query.Reader.attr_num': 110,
              'Context.StorageManager.Query.Reader.attr_var_num': 32,
              'Context.StorageManager.Query.Reader.cell_num': 1190000,
              'Context.StorageManager.Query.Reader.dim_num': 52,
              'Context.StorageManager.Query.Reader.dim_var_num': 52,
              'Context.StorageManager.Query.Reader.ignored_tiles': 2,
              'Context.StorageManager.Query.Reader.internal_loop_num': 26,
              'Context.StorageManager.Query.Reader.loop_num': 26,
              'Context.StorageManager.Query.Reader.num_tiles_read': 938,
              'Context.StorageManager.Query.Reader.read_unfiltered_byte_num': 96274068,
              'Context.StorageManager.Query.Reader.result_num': 501699,
              'Context.StorageManager.VFS.file_size_num': 1,
              'Context.StorageManager.VFS.read_byte_num': 726522977,
              'Context.StorageManager.VFS.read_ops_num': 272,
              'Context.StorageManager.read_array_meta_size': 0,
              'Context.StorageManager.read_array_schema_size': 960,
              'Context.StorageManager.read_frag_meta_size': 1584,
              'Context.StorageManager.read_rtree_size': 3171014,
              'Context.StorageManager.read_tile_offsets_size': 1865248,
              'Context.StorageManager.read_tile_var_offsets_size': 1356544,
              'Context.StorageManager.read_tile_var_sizes_size': 1356544,
              'Context.StorageManager.read_unfiltered_byte_num': 7750310},
 'python': '{"py.buffer_conversion_time": 0.0004088878631591797, '
           '"py.core_read_query_initial_submit_time": 13.619953776, '
           '"py.core_read_query_total_time": 13.62031456, "py.getitem_time": '
           '0.0007717609405517578, "py.getitem_time.add_ranges": '
           '0.0001685619354248047, "py.pandas_index_update_time": '
           '3.647804260253906e-05, "py.query_retries_count": 0.0}',
 'timers': {'Context.StorageManager.Query.Reader.apply_query_condition.avg': 0.000182243,
            'Context.StorageManager.Query.Reader.apply_query_condition.sum': 0.00182243,
            'Context.StorageManager.Query.Reader.compute_results_count_sparse.avg': 7.56013e-05,
            'Context.StorageManager.Query.Reader.compute_results_count_sparse.sum': 0.00884535,
            'Context.StorageManager.Query.Reader.compute_tile_bitmaps.avg': 0.000581931,
            'Context.StorageManager.Query.Reader.compute_tile_bitmaps.sum': 0.00581931,
            'Context.StorageManager.Query.Reader.copy_fixed_data_tiles.avg': 0.000327327,
            'Context.StorageManager.Query.Reader.copy_fixed_data_tiles.sum': 0.00981982,
            'Context.StorageManager.Query.Reader.copy_offsets_tiles.avg': 0.000786381,
            'Context.StorageManager.Query.Reader.copy_offsets_tiles.sum': 0.0275233,
            'Context.StorageManager.Query.Reader.copy_var_tiles.avg': 0.000592393,
            'Context.StorageManager.Query.Reader.copy_var_tiles.sum': 0.0207338,
            'Context.StorageManager.Query.Reader.create_result_tiles.avg': 1.50168e-05,
            'Context.StorageManager.Query.Reader.create_result_tiles.sum': 0.000390437,
            'Context.StorageManager.Query.Reader.dowork.avg': 0.70863,
            'Context.StorageManager.Query.Reader.dowork.sum': 18.4244,
            'Context.StorageManager.Query.Reader.load_initial_data.avg': 0.0130393,
            'Context.StorageManager.Query.Reader.load_initial_data.sum': 0.339021,
            'Context.StorageManager.Query.Reader.load_tile_offsets.avg': 0.00609423,
            'Context.StorageManager.Query.Reader.load_tile_offsets.sum': 0.47535,
            'Context.StorageManager.Query.Reader.load_tile_var_sizes.avg': 0.0106503,
            'Context.StorageManager.Query.Reader.load_tile_var_sizes.sum': 0.276908,
            'Context.StorageManager.Query.Reader.process_tiles.avg': 1.44423,
            'Context.StorageManager.Query.Reader.process_tiles.sum': 14.4423,
            'Context.StorageManager.Query.Reader.read_and_unfilter_attributes.avg': 1.43544,
            'Context.StorageManager.Query.Reader.read_and_unfilter_attributes.sum': 14.3544,
            'Context.StorageManager.Query.Reader.read_and_unfilter_coords.avg': 0.288153,
            'Context.StorageManager.Query.Reader.read_and_unfilter_coords.sum': 2.88153,
            'Context.StorageManager.Query.Reader.read_attribute_tiles.avg': 0.7477,
            'Context.StorageManager.Query.Reader.read_attribute_tiles.sum': 14.954,
            'Context.StorageManager.Query.Reader.read_coordinate_tiles.avg': 0.108101,
            'Context.StorageManager.Query.Reader.read_coordinate_tiles.sum': 2.16202,
            'Context.StorageManager.Query.Reader.read_tiles.avg': 0.427897,
            'Context.StorageManager.Query.Reader.read_tiles.sum': 17.1159,
            'Context.StorageManager.Query.Reader.resize_fixed_results_to_copy.avg': 5.1209e-06,
            'Context.StorageManager.Query.Reader.resize_fixed_results_to_copy.sum': 5.1209e-05,
            'Context.StorageManager.Query.Reader.switch_sizes_to_offsets.avg': 3.96864e-05,
            'Context.StorageManager.Query.Reader.switch_sizes_to_offsets.sum': 0.00138902,
            'Context.StorageManager.Query.Reader.tile_offset_sizes.avg': 3.68069e-06,
            'Context.StorageManager.Query.Reader.tile_offset_sizes.sum': 9.5698e-05,
            'Context.StorageManager.Query.Reader.unfilter_attr_tiles.avg': 0.00161202,
            'Context.StorageManager.Query.Reader.unfilter_attr_tiles.sum': 0.0822132,
            'Context.StorageManager.Query.Reader.unfilter_coord_tiles.avg': 0.000742896,
            'Context.StorageManager.Query.Reader.unfilter_coord_tiles.sum': 0.0371448,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_meta_uris.avg': 0.011372,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_meta_uris.sum': 0.011372,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_schema_uris.avg': 0.0841105,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_schema_uris.sum': 0.0841105,
            'Context.StorageManager.VFS.ArrayDirectory.list_commit_uris.avg': 0.0754825,
            'Context.StorageManager.VFS.ArrayDirectory.list_commit_uris.sum': 0.0754825,
            'Context.StorageManager.VFS.ArrayDirectory.list_fragment_meta_uris.avg': 0.0677259,
            'Context.StorageManager.VFS.ArrayDirectory.list_fragment_meta_uris.sum': 0.0677259,
            'Context.StorageManager.VFS.ArrayDirectory.list_root_uris.avg': 0.0158559,
            'Context.StorageManager.VFS.ArrayDirectory.list_root_uris.sum': 0.0158559,
            'Context.StorageManager.VFS.ArrayDirectory.load_consolidated_commit_uris.avg': 1.904e-06,
            'Context.StorageManager.VFS.ArrayDirectory.load_consolidated_commit_uris.sum': 1.904e-06,
            'Context.StorageManager.array_open_READ.avg': 0.443385,
            'Context.StorageManager.array_open_READ.sum': 0.443385,
            'Context.StorageManager.array_open_read_load_directory.avg': 0.0914446,
            'Context.StorageManager.array_open_read_load_directory.sum': 0.0914446,
            'Context.StorageManager.array_open_read_load_schemas_and_fragment_meta.avg': 0.35193,
            'Context.StorageManager.array_open_read_load_schemas_and_fragment_meta.sum': 0.35193,
            'Context.StorageManager.sm_load_all_array_schemas.avg': 0.0262434,
            'Context.StorageManager.sm_load_all_array_schemas.sum': 0.0262434,
            'Context.StorageManager.sm_load_array_metadata.avg': 5.957e-06,
            'Context.StorageManager.sm_load_array_metadata.sum': 5.957e-06,
            'Context.StorageManager.sm_load_array_schema_from_uri.avg': 0.0262303,
            'Context.StorageManager.sm_load_array_schema_from_uri.sum': 0.0262303,
            'Context.StorageManager.sm_load_array_schemas_and_fragment_metadata.avg': 0.351926,
            'Context.StorageManager.sm_load_array_schemas_and_fragment_metadata.sum': 0.351926,
            'Context.StorageManager.sm_load_filtered_fragment_uris.avg': 3.4826e-05,
            'Context.StorageManager.sm_load_filtered_fragment_uris.sum': 3.4826e-05,
            'Context.StorageManager.sm_load_fragment_metadata.avg': 0.325637,
            'Context.StorageManager.sm_load_fragment_metadata.sum': 0.325637}}
prathapsridharan commented 10 months ago

Non CPU Time profiling done on RDEV

I have also used cProfile to count non-cpu-time (IO, process pre-empted by OS scheduler, memory swap, etc) and found that about 60-80% of the time is spent does not involve the cpu.

For example, in the test case of picking 50 genes and group-by disease operation that queries expression_summary cube, the total reported time is 13.50 seconds and the non-cpu time is about 10.50 seconds.

The functions I wrote that enable profiling total process time or non-cpu-time is here (cleaned up a little for readability):

def not_cpu_time():
    times = os.times()
    return times.elapsed - (times.system + times.user)

def cprofile_query(*, q, criteria, compare=None, measure_io=False):
    profiler = Profile(not_cpu_time) if measure_io else Profile()

    expression_summary = profiler.runcall(q.expression_summary, criteria, compare_dimension=compare)
    s = io.StringIO()
    stats = Stats(profiler, stream=s)
    stats.strip_dirs()
    stats.sort_stats("cumulative")
    stats.print_stats()
    logger.info(f"Profiling expression_summary with measure_io flag set to: {measure_io}:\n{s.getvalue()}")

    return expression_summary

# to profile non-cpu-time call it with measure_io=True:
cprofile_query(q=q, criteria=criteria, compare=compare, measure_io=True)

# to profile full process time (cpu and non-cpu-time included), call with measure_io=False:
cprofile_query(q=q, criteria=criteria, compare=compare, measure_io=False)
prathapsridharan commented 10 months ago

Profiling of querying expression_summary_cube on EC2 instance

Profile results with tiledb data snapshot on ec2 local disk

  1. I profiled the _query function the web application calls and found out that this takes about 1 second which is drastically improved from the 10 to 13 seconds it takes in the production, stage, and rdev machines
  2. I further profiled the specific part of the code that simply sends the query to tiledb - That is, this profiling does not include the function in (1) which does query condition construction and converting the result from a dataframeIndexer to a dataframe object. This is 1000X faster: 0.001 seconds!

Profile results with tiledb data resident on s3 but making the query from the ec2 instance

I could not get this test case to work because context needed to setup was quite a bit more complicated

I have a command line script here that does profiling in several ways

Steps:

Profiler Script Usage:

# simple wallclock profiling calling the function the web app would call to execute the query
$ python3 profile_expr_summary.py --data-location local  --profile-type simple-wallclock --profile-scope web-app-query

# simple wallclock profiling excusively calling tiledb query method
$ python3 profile_expr_summary.py --data-location local  --profile-type simple-wallclock --profile-scope raw-query

# Collect tiledb stats for web app query
$ python3 profile_expr_summary.py --data-location local  --profile-type tiledb-stats --profile-scope web-app-query

# Collect tiledb stats for raw query
$ python3 profile_expr_summary.py --data-location local  --profile-type tiledb-stats --profile-scope raw-query

# simple wallclock profiling of raw query by pulling snapshot from s3 - THIS CURRENTLY DOES NOT WORK
$ python3 profile_expr_summary.py --data-location s3  --profile-type tiledb-stats --profile-scope raw-query
prathapsridharan commented 10 months ago

Stats collected from tiledb.statsdump() for query of data on local disk of an ec2 instance

# The command I ran
$ python3 profile_expr_summary.py --data-location local  --profile-type tiledb-stats --profile-scope web-app-query
{'counters': {'Context.StorageManager.Query.Reader.attr_fixed_num': 24,
              'Context.StorageManager.Query.Reader.attr_num': 33,
              'Context.StorageManager.Query.Reader.attr_var_num': 9,
              'Context.StorageManager.Query.Reader.cell_num': 950000,
              'Context.StorageManager.Query.Reader.dim_num': 23,
              'Context.StorageManager.Query.Reader.dim_var_num': 23,
              'Context.StorageManager.Query.Reader.ignored_tiles': 1,
              'Context.StorageManager.Query.Reader.internal_loop_num': 8,
              'Context.StorageManager.Query.Reader.loop_num': 8,
              'Context.StorageManager.Query.Reader.num_tiles_read': 57546,
              'Context.StorageManager.Query.Reader.read_unfiltered_byte_num': 5712173514,
              'Context.StorageManager.Query.Reader.result_num': 42915399,
              'Context.StorageManager.VFS.file_size_num': 2,
              'Context.StorageManager.VFS.read_byte_num': 1437612444,
              'Context.StorageManager.VFS.read_ops_num': 255,
              'Context.StorageManager.read_array_meta_size': 0,
              'Context.StorageManager.read_array_schema_size': 1480,
              'Context.StorageManager.read_frag_meta_size': 2552,
              'Context.StorageManager.read_rtree_size': 3171014,
              'Context.StorageManager.read_tile_offsets_size': 1594656,
              'Context.StorageManager.read_tile_var_offsets_size': 983904,
              'Context.StorageManager.read_tile_var_sizes_size': 983904,
              'Context.StorageManager.read_unfiltered_byte_num': 6734958,
              'Context.StorageManager.subSubarray.add_range_dim_0': 50,
              'Context.StorageManager.subSubarray.add_range_dim_2': 1},
 'python': '{"py.buffer_conversion_time": 0.0003116130828857422, '
           '"py.core_read_query_initial_submit_time": 0.593130085, '
           '"py.core_read_query_total_time": 0.593898498, "py.getitem_time": '
           '0.0007047653198242188, "py.getitem_time.add_ranges": '
           '0.00018072128295898438, "py.pandas_index_update_time": '
           '4.744529724121094e-05, "py.query_retries_count": 0.0}',
 'timers': {'Context.StorageManager.Query.Reader.apply_query_condition.avg': 1.5e-07,
            'Context.StorageManager.Query.Reader.apply_query_condition.sum': 3e-07,
            'Context.StorageManager.Query.Reader.compute_results_count_sparse.avg': 7.83403e-05,
            'Context.StorageManager.Query.Reader.compute_results_count_sparse.sum': 0.0072073,
            'Context.StorageManager.Query.Reader.compute_tile_bitmaps.avg': 0.00123238,
            'Context.StorageManager.Query.Reader.compute_tile_bitmaps.sum': 0.00246476,
            'Context.StorageManager.Query.Reader.copy_fixed_data_tiles.avg': 0.00319179,
            'Context.StorageManager.Query.Reader.copy_fixed_data_tiles.sum': 0.076603,
            'Context.StorageManager.Query.Reader.copy_offsets_tiles.avg': 0.0164023,
            'Context.StorageManager.Query.Reader.copy_offsets_tiles.sum': 0.524874,
            'Context.StorageManager.Query.Reader.copy_var_tiles.avg': 0.00946177,
            'Context.StorageManager.Query.Reader.copy_var_tiles.sum': 0.302777,
            'Context.StorageManager.Query.Reader.create_result_tiles.avg': 0.00479548,
            'Context.StorageManager.Query.Reader.create_result_tiles.sum': 0.00959097,
            'Context.StorageManager.Query.Reader.dowork.avg': 0.893305,
            'Context.StorageManager.Query.Reader.dowork.sum': 7.14644,
            'Context.StorageManager.Query.Reader.load_initial_data.avg': 0.00761377,
            'Context.StorageManager.Query.Reader.load_initial_data.sum': 0.0152275,
            'Context.StorageManager.Query.Reader.load_tile_offsets.avg': 0.0016087,
            'Context.StorageManager.Query.Reader.load_tile_offsets.sum': 0.00965217,
            'Context.StorageManager.Query.Reader.load_tile_var_sizes.avg': 0.00110676,
            'Context.StorageManager.Query.Reader.load_tile_var_sizes.sum': 0.00221352,
            'Context.StorageManager.Query.Reader.process_tiles.avg': 0.883828,
            'Context.StorageManager.Query.Reader.process_tiles.sum': 7.07062,
            'Context.StorageManager.Query.Reader.read_and_unfilter_attributes.avg': 0.627374,
            'Context.StorageManager.Query.Reader.read_and_unfilter_attributes.sum': 5.01899,
            'Context.StorageManager.Query.Reader.read_and_unfilter_coords.avg': 0.0150342,
            'Context.StorageManager.Query.Reader.read_and_unfilter_coords.sum': 0.0300684,
            'Context.StorageManager.Query.Reader.read_attribute_tiles.avg': 0.120051,
            'Context.StorageManager.Query.Reader.read_attribute_tiles.sum': 1.20051,
            'Context.StorageManager.Query.Reader.read_coordinate_tiles.avg': 0.00525141,
            'Context.StorageManager.Query.Reader.read_coordinate_tiles.sum': 0.0105028,
            'Context.StorageManager.Query.Reader.read_tiles.avg': 0.100915,
            'Context.StorageManager.Query.Reader.read_tiles.sum': 1.21098,
            'Context.StorageManager.Query.Reader.resize_fixed_results_to_copy.avg': 2.22929e-05,
            'Context.StorageManager.Query.Reader.resize_fixed_results_to_copy.sum': 0.000178343,
            'Context.StorageManager.Query.Reader.switch_sizes_to_offsets.avg': 0.00631311,
            'Context.StorageManager.Query.Reader.switch_sizes_to_offsets.sum': 0.202019,
            'Context.StorageManager.Query.Reader.tile_offset_sizes.avg': 7.6905e-06,
            'Context.StorageManager.Query.Reader.tile_offset_sizes.sum': 1.5381e-05,
            'Context.StorageManager.Query.Reader.unfilter_attr_tiles.avg': 0.0634444,
            'Context.StorageManager.Query.Reader.unfilter_attr_tiles.sum': 2.09366,
            'Context.StorageManager.Query.Reader.unfilter_coord_tiles.avg': 0.0639595,
            'Context.StorageManager.Query.Reader.unfilter_coord_tiles.sum': 1.72691,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_meta_uris.avg': 9.882e-06,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_meta_uris.sum': 1.9764e-05,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_schema_uris.avg': 5.9662e-05,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_schema_uris.sum': 0.000119324,
            'Context.StorageManager.VFS.ArrayDirectory.list_commit_uris.avg': 8.21e-05,
            'Context.StorageManager.VFS.ArrayDirectory.list_commit_uris.sum': 0.0001642,
            'Context.StorageManager.VFS.ArrayDirectory.list_fragment_meta_uris.avg': 1.25745e-05,
            'Context.StorageManager.VFS.ArrayDirectory.list_fragment_meta_uris.sum': 2.5149e-05,
            'Context.StorageManager.VFS.ArrayDirectory.list_root_uris.avg': 7.61865e-05,
            'Context.StorageManager.VFS.ArrayDirectory.list_root_uris.sum': 0.000152373,
            'Context.StorageManager.VFS.ArrayDirectory.load_consolidated_commit_uris.avg': 1.0425e-06,
            'Context.StorageManager.VFS.ArrayDirectory.load_consolidated_commit_uris.sum': 2.085e-06,
            'Context.StorageManager.array_open_READ.avg': 0.000473925,
            'Context.StorageManager.array_open_READ.sum': 0.00094785,
            'Context.StorageManager.array_open_read_load_directory.avg': 0.000242743,
            'Context.StorageManager.array_open_read_load_directory.sum': 0.000485486,
            'Context.StorageManager.array_open_read_load_schemas_and_fragment_meta.avg': 0.000222092,
            'Context.StorageManager.array_open_read_load_schemas_and_fragment_meta.sum': 0.000444183,
            'Context.StorageManager.sm_load_all_array_schemas.avg': 0.00012188,
            'Context.StorageManager.sm_load_all_array_schemas.sum': 0.000243761,
            'Context.StorageManager.sm_load_array_metadata.avg': 6.674e-06,
            'Context.StorageManager.sm_load_array_metadata.sum': 1.3348e-05,
            'Context.StorageManager.sm_load_array_schema_from_uri.avg': 0.000114239,
            'Context.StorageManager.sm_load_array_schema_from_uri.sum': 0.000228479,
            'Context.StorageManager.sm_load_array_schemas_and_fragment_metadata.avg': 0.000220344,
            'Context.StorageManager.sm_load_array_schemas_and_fragment_metadata.sum': 0.000440687,
            'Context.StorageManager.sm_load_filtered_fragment_uris.avg': 2.35015e-05,
            'Context.StorageManager.sm_load_filtered_fragment_uris.sum': 4.7003e-05,
            'Context.StorageManager.sm_load_fragment_metadata.avg': 6.96155e-05,
            'Context.StorageManager.sm_load_fragment_metadata.sum': 0.000139231,
            'Context.StorageManager.subSubarray.compute_relevant_frags.avg': 5.881e-05,
            'Context.StorageManager.subSubarray.compute_relevant_frags.sum': 5.881e-05,
            'Context.StorageManager.subSubarray.read_compute_simple_tile_overlap.avg': 0.0151931,
            'Context.StorageManager.subSubarray.read_compute_simple_tile_overlap.sum': 0.0151931,
            'Context.StorageManager.subSubarray.read_load_relevant_rtrees.avg': 0.0147723,
            'Context.StorageManager.subSubarray.read_load_relevant_rtrees.sum': 0.0147723,
            'Context.StorageManager.subSubarray.sort_ranges.avg': 9.5146e-05,
            'Context.StorageManager.subSubarray.sort_ranges.sum': 0.000190292}}
prathapsridharan commented 9 months ago

Running the following single tiledb query script, I noticed a couple of interesting things things:

NOTE: This script runs a single query exactly once. The query is hardcoded. There is no randomization of any kind.

Running the single query on on EC2 m5.xlarge instance (4 vCPU and 16GB of RAM)

This is the closest approximation to a resource allocation for our prod environment. In prod, each container gets 4 vCPU and 30 GB of RAM. There are no instance types with exactly this spec so settled on the conservative side.

  1. Querying from S3 can be as much as 10X slower than querying from local disk (see the following points)

  2. The first 2 queries to S3 are extremely slow, on the order of the runtimes we see in the web application. But after that the same query is pretty fast. This might support the hypothesis that I/O is the culprit for the slow runtimes and subsequent runs hit a cache and therefore significantly faster. The caching might be done by the S3 service to serve up recently queried files:

(venv) ssm-user@ip-172-31-13-49:~/single-cell-data-portal$ python3 profile_single_query_s3.py --data-location s3 --profile-type simple-wallclock
Query Time: 9.871938740951009
(venv) ssm-user@ip-172-31-13-49:~/single-cell-data-portal$ python3 profile_single_query_s3.py --data-location s3 --profile-type simple-wallclock
Query Time: 6.401317582000047
(venv) ssm-user@ip-172-31-13-49:~/single-cell-data-portal$ python3 profile_single_query_s3.py --data-location s3 --profile-type simple-wallclock
Query Time: 3.399237139034085
(venv) ssm-user@ip-172-31-13-49:~/single-cell-data-portal$ python3 profile_single_query_s3.py --data-location s3 --profile-type simple-wallclock
Query Time: 2.3840295899426565
(venv) ssm-user@ip-172-31-13-49:~/single-cell-data-portal$ python3 profile_single_query_s3.py --data-location s3 --profile-type simple-wallclock
Query Time: 2.479845870984718
(venv) ssm-user@ip-172-31-13-49:~/single-cell-data-portal$ python3 profile_single_query_s3.py --data-location s3 --profile-type simple-wallclock
Query Time: 2.3942137280246243
  1. If you wait a few seconds and run the query against S3, the query time increases back to around 7-9 seconds. This seems to indicate that the cache is purged after a TTL
(venv) ssm-user@ip-172-31-13-49:~/single-cell-data-portal$ python3 profile_single_query_s3.py --data-location s3 --profile-type simple-wallclock
Query Time: 7.943302907049656
(venv) ssm-user@ip-172-31-13-49:~/single-cell-data-portal$ python3 profile_single_query_s3.py --data-location s3 --profile-type simple-wallclock
Query Time: 7.340425575966947
  1. Running against local disk, the first run is slow. It is hard to duplicate the conditions for this but this might be an OS level characteristic where first access of a local disk block is slow. However, subsequent runs are very fast. More importantly, subsequent runs against local disk are blazing fast no matter how long you wait between runs unlike the run times of S3. This seems to indicate that having the snapshot on local disk is a significant boost
(venv) ssm-user@ip-172-31-13-49:~/single-cell-data-portal$ python3 profile_single_query_s3.py --data-location local --profile-type simple-wallclock
Query Time: 4.589273056015372
(venv) ssm-user@ip-172-31-13-49:~/single-cell-data-portal$ python3 profile_single_query_s3.py --data-location local --profile-type simple-wallclock
Query Time: 0.6380285220220685
(venv) ssm-user@ip-172-31-13-49:~/single-cell-data-portal$ python3 profile_single_query_s3.py --data-location local --profile-type simple-wallclock
Query Time: 0.6346769869560376
  1. Here is a comparison of query against s3 and local disk after waiting a few minutes. Notice that s3 run times went up quite a bit, while local disk run time remains fast.
(venv) ssm-user@ip-172-31-13-49:~/single-cell-data-portal$ python3 profile_single_query_s3.py --data-location s3 --profile-type simple-wallclock
Query Time: 5.885547672049142
(venv) ssm-user@ip-172-31-13-49:~/single-cell-data-portal$ python3 profile_single_query_s3.py --data-location local --profile-type simple-wallclock
Query Time: 0.6382372019579634

Running the single query on on EC2 m5.large instance (2 vCPU and 8GB of RAM)

This is the closest approximation to a resource allocation for our rdev environment. In rdev, each container gets 2 vCPU and 8 GB of RAM.

  1. The same run time patterns are exhibited in the smaller instance that approximates rdev. This is unsurprising. Moreover, the query time against S3 is quite a bit longer (~ 11-13 seconds) in the smaller m5.large EC2 instance than the more powerful m5.xlarge EC2 instance (~ 7-9 seconds). But local disk query time between instances remain fast and doesn't vary much. Therefore, querying from S3 with less powerful machines makes the runtimes significantly slower. This has cost implications: By moving the data to local disk we give ourselves the option to run on less powerful machines because tiledb queries will remain fast.
(venv) ssm-user@ip-172-31-24-15:~/single-cell-data-portal$ python3 profile_single_query_s3.py --data-location s3 --profile-type simple-wallclock
Query Time: 13.380075528000361
(venv) ssm-user@ip-172-31-24-15:~/single-cell-data-portal$ python3 profile_single_query_s3.py --data-location local --profile-type simple-wallclock
Query Time: 0.7477878340000643
atarashansky commented 9 months ago

@prathapsridharan Is the first query to the local snapshot always slow? Subsequent queries are fast if you use the same query as the first one, but what if you change the query each time? Is it slow each time? If that's the case, then I think a more relevant performance comparison would be looking at the first invocation of a unique query.

prathapsridharan commented 9 months ago

@atarashansky - The slowness on first query of local disk snapshot is likely due to the OS not having had the chance to warm up the filesystem cache. To illustrate that point, I ran 2 tests for local disk scenario. I even included a sleep of 30 seconds between runs to check if that might make a surface any deviant query time - It doesn't:

  1. 20 runs of querying the local disk snapshot by randomizing the 50 genes queried over each trial. Notice that the first query slowness effect is not there and the query times are different (because the queries are different between runs) but there is not any wild variation:
$ python3 profile_expr_summary.py --data-location local --profile-type simple-wallclock --profile-scope raw-query --random True --num-trials 20 --wait 30

#####Simple Wallclock Profiling RAW QUERY#####
num_trials: 20, num_genes_per_trial: 50, randomize_queries: True, wait_time: 30.0

trial 0: query_time: 0.3260967959649861
trial 1: query_time: 0.28244731796439737
trial 2: query_time: 0.2792491050204262
trial 3: query_time: 0.28724771400447935
trial 4: query_time: 0.2647852109512314
trial 5: query_time: 0.26117483305279166
trial 6: query_time: 0.3354923519073054
trial 7: query_time: 0.3096563830040395
trial 8: query_time: 0.2736107819946483
trial 9: query_time: 0.31849775300361216
trial 10: query_time: 0.17899379297159612
trial 11: query_time: 0.1869285280117765
trial 12: query_time: 0.3909081279998645
trial 13: query_time: 0.2754709960427135
trial 14: query_time: 0.24768701498396695
trial 15: query_time: 0.34317331202328205
trial 16: query_time: 0.3123967950232327
trial 17: query_time: 0.22704013204202056
trial 18: query_time: 0.23085670894943178
trial 19: query_time: 0.25487172696739435
  1. Running the same query repeatedly (ie no randomization) gives almost equal query times between runs when querying local disk. Here too there isn't any first query slowness effect
$ python3 profile_expr_summary.py --data-location local --profile-type simple-wallclock --profile-scope raw-query --num-trials 10 --wait 30

#####Simple Wallclock Profiling RAW QUERY#####
num_trials: 10, num_genes_per_trial: 50, randomize_queries: False, wait_time: 30.0
trial 0: query_time: 0.6219111330574378
trial 1: query_time: 0.597706374945119
trial 2: query_time: 0.6027412810362875
trial 3: query_time: 0.5878630029037595
trial 4: query_time: 0.6020356669323519
trial 5: query_time: 0.6042859819717705
trial 6: query_time: 0.5935844500781968
trial 7: query_time: 0.6033124219393358
trial 8: query_time: 0.6097885479684919
trial 9: query_time: 0.6087296219775453
prathapsridharan commented 9 months ago

Tiledb stats for querying tiledb array in S3

This test runs a single hardcoded query against Tiledb data resident in S3 and collects tiledb-stats to investigate why there is significant variation in query time between multiple runs of the same query that about 30 - 60 seconds apart.

Note that when the tiledb array is on local disk, the same query does not have wildly varying query times over multiple runs when waiting several seconds or minutes between runs

First run takes 7.2 seconds

$ time python3 profile_single_query_s3.py --data-location s3 --profile-type tiledb-stats

real    0m7.246s
user    0m2.200s
sys 0m1.900s
{'counters': {'Context.StorageManager.Query.Reader.attr_fixed_num': 3,
              'Context.StorageManager.Query.Reader.attr_num': 5,
              'Context.StorageManager.Query.Reader.attr_var_num': 2,
              'Context.StorageManager.Query.Reader.cell_num': 910000,
              'Context.StorageManager.Query.Reader.dim_num': 2,
              'Context.StorageManager.Query.Reader.dim_var_num': 2,
              'Context.StorageManager.Query.Reader.ignored_tiles': 0,
              'Context.StorageManager.Query.Reader.internal_loop_num': 1,
              'Context.StorageManager.Query.Reader.loop_num': 1,
              'Context.StorageManager.Query.Reader.num_tiles_read': 728,
              'Context.StorageManager.Query.Reader.read_unfiltered_byte_num': 73981040,
              'Context.StorageManager.Query.Reader.result_num': 419298,
              'Context.StorageManager.VFS.file_size_num': 1,
              'Context.StorageManager.VFS.read_byte_num': 705308023,
              'Context.StorageManager.VFS.read_ops_num': 126,
              'Context.StorageManager.read_array_meta_size': 0,
              'Context.StorageManager.read_array_schema_size': 960,
              'Context.StorageManager.read_frag_meta_size': 1584,
              'Context.StorageManager.read_rtree_size': 3244564,
              'Context.StorageManager.read_tile_offsets_size': 1388224,
              'Context.StorageManager.read_tile_var_offsets_size': 867640,
              'Context.StorageManager.read_tile_var_sizes_size': 867640,
              'Context.StorageManager.read_unfiltered_byte_num': 6369028},
 'python': '{"py.buffer_conversion_time": 0.0004360675811767578, '
           '"py.core_read_query_initial_submit_time": 6.460057297, '
           '"py.core_read_query_total_time": 6.4605091, "py.getitem_time": '
           '0.0017042160034179688, "py.getitem_time.add_ranges": '
           '0.00015687942504882812, "py.pandas_index_update_time": '
           '5.269050598144531e-05, "py.query_retries_count": 0.0}',
 'timers': {'Context.StorageManager.Query.Reader.apply_query_condition.avg': 2.84e-07,
            'Context.StorageManager.Query.Reader.apply_query_condition.sum': 2.84e-07,
            'Context.StorageManager.Query.Reader.compute_results_count_sparse.avg': 7.83214e-05,
            'Context.StorageManager.Query.Reader.compute_results_count_sparse.sum': 0.00689228,
            'Context.StorageManager.Query.Reader.compute_tile_bitmaps.avg': 0.00227109,
            'Context.StorageManager.Query.Reader.compute_tile_bitmaps.sum': 0.00227109,
            'Context.StorageManager.Query.Reader.copy_fixed_data_tiles.avg': 0.00113583,
            'Context.StorageManager.Query.Reader.copy_fixed_data_tiles.sum': 0.00340748,
            'Context.StorageManager.Query.Reader.copy_offsets_tiles.avg': 0.00233254,
            'Context.StorageManager.Query.Reader.copy_offsets_tiles.sum': 0.00933016,
            'Context.StorageManager.Query.Reader.copy_var_tiles.avg': 0.0013499,
            'Context.StorageManager.Query.Reader.copy_var_tiles.sum': 0.00539961,
            'Context.StorageManager.Query.Reader.create_result_tiles.avg': 0.000163552,
            'Context.StorageManager.Query.Reader.create_result_tiles.sum': 0.000163552,
            'Context.StorageManager.Query.Reader.dowork.avg': 6.45981,
            'Context.StorageManager.Query.Reader.dowork.sum': 6.45981,
            'Context.StorageManager.Query.Reader.load_initial_data.avg': 0.0792599,
            'Context.StorageManager.Query.Reader.load_initial_data.sum': 0.0792599,
            'Context.StorageManager.Query.Reader.load_tile_offsets.avg': 0.0555098,
            'Context.StorageManager.Query.Reader.load_tile_offsets.sum': 0.166529,
            'Context.StorageManager.Query.Reader.load_tile_var_sizes.avg': 0.0515541,
            'Context.StorageManager.Query.Reader.load_tile_var_sizes.sum': 0.0515541,
            'Context.StorageManager.Query.Reader.process_tiles.avg': 5.89972,
            'Context.StorageManager.Query.Reader.process_tiles.sum': 5.89972,
            'Context.StorageManager.Query.Reader.read_and_unfilter_attributes.avg': 5.87673,
            'Context.StorageManager.Query.Reader.read_and_unfilter_attributes.sum': 5.87673,
            'Context.StorageManager.Query.Reader.read_and_unfilter_coords.avg': 0.259972,
            'Context.StorageManager.Query.Reader.read_and_unfilter_coords.sum': 0.259972,
            'Context.StorageManager.Query.Reader.read_attribute_tiles.avg': 2.90949,
            'Context.StorageManager.Query.Reader.read_attribute_tiles.sum': 5.81897,
            'Context.StorageManager.Query.Reader.read_coordinate_tiles.avg': 0.120042,
            'Context.StorageManager.Query.Reader.read_coordinate_tiles.sum': 0.240084,
            'Context.StorageManager.Query.Reader.read_tiles.avg': 1.51476,
            'Context.StorageManager.Query.Reader.read_tiles.sum': 6.05904,
            'Context.StorageManager.Query.Reader.resize_fixed_results_to_copy.avg': 3.407e-06,
            'Context.StorageManager.Query.Reader.resize_fixed_results_to_copy.sum': 3.407e-06,
            'Context.StorageManager.Query.Reader.switch_sizes_to_offsets.avg': 0.000255104,
            'Context.StorageManager.Query.Reader.switch_sizes_to_offsets.sum': 0.00102042,
            'Context.StorageManager.Query.Reader.tile_offset_sizes.avg': 7.313e-06,
            'Context.StorageManager.Query.Reader.tile_offset_sizes.sum': 7.313e-06,
            'Context.StorageManager.Query.Reader.unfilter_attr_tiles.avg': 0.00651393,
            'Context.StorageManager.Query.Reader.unfilter_attr_tiles.sum': 0.0325696,
            'Context.StorageManager.Query.Reader.unfilter_coord_tiles.avg': 0.00391671,
            'Context.StorageManager.Query.Reader.unfilter_coord_tiles.sum': 0.0195835,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_meta_uris.avg': 0.0703353,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_meta_uris.sum': 0.0703353,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_schema_uris.avg': 0.077721,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_schema_uris.sum': 0.077721,
            'Context.StorageManager.VFS.ArrayDirectory.list_commit_uris.avg': 0.0709293,
            'Context.StorageManager.VFS.ArrayDirectory.list_commit_uris.sum': 0.0709293,
            'Context.StorageManager.VFS.ArrayDirectory.list_fragment_meta_uris.avg': 0.0765407,
            'Context.StorageManager.VFS.ArrayDirectory.list_fragment_meta_uris.sum': 0.0765407,
            'Context.StorageManager.VFS.ArrayDirectory.list_root_uris.avg': 0.0717963,
            'Context.StorageManager.VFS.ArrayDirectory.list_root_uris.sum': 0.0717963,
            'Context.StorageManager.VFS.ArrayDirectory.load_consolidated_commit_uris.avg': 1.495e-06,
            'Context.StorageManager.VFS.ArrayDirectory.load_consolidated_commit_uris.sum': 1.495e-06,
            'Context.StorageManager.array_open_READ.avg': 0.243471,
            'Context.StorageManager.array_open_READ.sum': 0.243471,
            'Context.StorageManager.array_open_read_load_directory.avg': 0.0777874,
            'Context.StorageManager.array_open_read_load_directory.sum': 0.0777874,
            'Context.StorageManager.array_open_read_load_schemas_and_fragment_meta.avg': 0.165674,
            'Context.StorageManager.array_open_read_load_schemas_and_fragment_meta.sum': 0.165674,
            'Context.StorageManager.sm_load_all_array_schemas.avg': 0.0199367,
            'Context.StorageManager.sm_load_all_array_schemas.sum': 0.0199367,
            'Context.StorageManager.sm_load_array_metadata.avg': 7.046e-06,
            'Context.StorageManager.sm_load_array_metadata.sum': 7.046e-06,
            'Context.StorageManager.sm_load_array_schema_from_uri.avg': 0.0199263,
            'Context.StorageManager.sm_load_array_schema_from_uri.sum': 0.0199263,
            'Context.StorageManager.sm_load_array_schemas_and_fragment_metadata.avg': 0.165671,
            'Context.StorageManager.sm_load_array_schemas_and_fragment_metadata.sum': 0.165671,
            'Context.StorageManager.sm_load_filtered_fragment_uris.avg': 3.2699e-05,
            'Context.StorageManager.sm_load_filtered_fragment_uris.sum': 3.2699e-05,
            'Context.StorageManager.sm_load_fragment_metadata.avg': 0.14569,
            'Context.StorageManager.sm_load_fragment_metadata.sum': 0.14569}}

Second run takes 3.9 seconds

$ time python3 profile_single_query_s3.py --data-location s3 --profile-type tiledb-stats

real    0m3.958s
user    0m2.076s
sys 0m1.642s
{'counters': {'Context.StorageManager.Query.Reader.attr_fixed_num': 3,
              'Context.StorageManager.Query.Reader.attr_num': 5,
              'Context.StorageManager.Query.Reader.attr_var_num': 2,
              'Context.StorageManager.Query.Reader.cell_num': 910000,
              'Context.StorageManager.Query.Reader.dim_num': 2,
              'Context.StorageManager.Query.Reader.dim_var_num': 2,
              'Context.StorageManager.Query.Reader.ignored_tiles': 0,
              'Context.StorageManager.Query.Reader.internal_loop_num': 1,
              'Context.StorageManager.Query.Reader.loop_num': 1,
              'Context.StorageManager.Query.Reader.num_tiles_read': 728,
              'Context.StorageManager.Query.Reader.read_unfiltered_byte_num': 73981040,
              'Context.StorageManager.Query.Reader.result_num': 419298,
              'Context.StorageManager.VFS.file_size_num': 1,
              'Context.StorageManager.VFS.read_byte_num': 705308023,
              'Context.StorageManager.VFS.read_ops_num': 126,
              'Context.StorageManager.read_array_meta_size': 0,
              'Context.StorageManager.read_array_schema_size': 960,
              'Context.StorageManager.read_frag_meta_size': 1584,
              'Context.StorageManager.read_rtree_size': 3244564,
              'Context.StorageManager.read_tile_offsets_size': 1388224,
              'Context.StorageManager.read_tile_var_offsets_size': 867640,
              'Context.StorageManager.read_tile_var_sizes_size': 867640,
              'Context.StorageManager.read_unfiltered_byte_num': 6369028},
 'python': '{"py.buffer_conversion_time": 0.0004315376281738281, '
           '"py.core_read_query_initial_submit_time": 3.244939281, '
           '"py.core_read_query_total_time": 3.245374541, "py.getitem_time": '
           '0.001729726791381836, "py.getitem_time.add_ranges": '
           '0.00015234947204589844, "py.pandas_index_update_time": '
           '5.888938903808594e-05, "py.query_retries_count": 0.0}',
 'timers': {'Context.StorageManager.Query.Reader.apply_query_condition.avg': 1.73e-07,
            'Context.StorageManager.Query.Reader.apply_query_condition.sum': 1.73e-07,
            'Context.StorageManager.Query.Reader.compute_results_count_sparse.avg': 7.93653e-05,
            'Context.StorageManager.Query.Reader.compute_results_count_sparse.sum': 0.00698415,
            'Context.StorageManager.Query.Reader.compute_tile_bitmaps.avg': 0.00231364,
            'Context.StorageManager.Query.Reader.compute_tile_bitmaps.sum': 0.00231364,
            'Context.StorageManager.Query.Reader.copy_fixed_data_tiles.avg': 0.00116303,
            'Context.StorageManager.Query.Reader.copy_fixed_data_tiles.sum': 0.0034891,
            'Context.StorageManager.Query.Reader.copy_offsets_tiles.avg': 0.00237875,
            'Context.StorageManager.Query.Reader.copy_offsets_tiles.sum': 0.00951498,
            'Context.StorageManager.Query.Reader.copy_var_tiles.avg': 0.0013519,
            'Context.StorageManager.Query.Reader.copy_var_tiles.sum': 0.00540759,
            'Context.StorageManager.Query.Reader.create_result_tiles.avg': 0.00017221,
            'Context.StorageManager.Query.Reader.create_result_tiles.sum': 0.00017221,
            'Context.StorageManager.Query.Reader.dowork.avg': 3.24464,
            'Context.StorageManager.Query.Reader.dowork.sum': 3.24464,
            'Context.StorageManager.Query.Reader.load_initial_data.avg': 0.0445728,
            'Context.StorageManager.Query.Reader.load_initial_data.sum': 0.0445728,
            'Context.StorageManager.Query.Reader.load_tile_offsets.avg': 0.0597256,
            'Context.StorageManager.Query.Reader.load_tile_offsets.sum': 0.179177,
            'Context.StorageManager.Query.Reader.load_tile_var_sizes.avg': 0.0493179,
            'Context.StorageManager.Query.Reader.load_tile_var_sizes.sum': 0.0493179,
            'Context.StorageManager.Query.Reader.process_tiles.avg': 2.75611,
            'Context.StorageManager.Query.Reader.process_tiles.sum': 2.75611,
            'Context.StorageManager.Query.Reader.read_and_unfilter_attributes.avg': 2.73499,
            'Context.StorageManager.Query.Reader.read_and_unfilter_attributes.sum': 2.73499,
            'Context.StorageManager.Query.Reader.read_and_unfilter_coords.avg': 0.212649,
            'Context.StorageManager.Query.Reader.read_and_unfilter_coords.sum': 0.212649,
            'Context.StorageManager.Query.Reader.read_attribute_tiles.avg': 1.33839,
            'Context.StorageManager.Query.Reader.read_attribute_tiles.sum': 2.67678,
            'Context.StorageManager.Query.Reader.read_coordinate_tiles.avg': 0.096224,
            'Context.StorageManager.Query.Reader.read_coordinate_tiles.sum': 0.192448,
            'Context.StorageManager.Query.Reader.read_tiles.avg': 0.717303,
            'Context.StorageManager.Query.Reader.read_tiles.sum': 2.86921,
            'Context.StorageManager.Query.Reader.resize_fixed_results_to_copy.avg': 4.009e-06,
            'Context.StorageManager.Query.Reader.resize_fixed_results_to_copy.sum': 4.009e-06,
            'Context.StorageManager.Query.Reader.switch_sizes_to_offsets.avg': 0.000278483,
            'Context.StorageManager.Query.Reader.switch_sizes_to_offsets.sum': 0.00111393,
            'Context.StorageManager.Query.Reader.tile_offset_sizes.avg': 6.549e-06,
            'Context.StorageManager.Query.Reader.tile_offset_sizes.sum': 6.549e-06,
            'Context.StorageManager.Query.Reader.unfilter_attr_tiles.avg': 0.0065119,
            'Context.StorageManager.Query.Reader.unfilter_attr_tiles.sum': 0.0325595,
            'Context.StorageManager.Query.Reader.unfilter_coord_tiles.avg': 0.00397223,
            'Context.StorageManager.Query.Reader.unfilter_coord_tiles.sum': 0.0198612,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_meta_uris.avg': 0.0840461,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_meta_uris.sum': 0.0840461,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_schema_uris.avg': 0.0862886,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_schema_uris.sum': 0.0862886,
            'Context.StorageManager.VFS.ArrayDirectory.list_commit_uris.avg': 0.0854544,
            'Context.StorageManager.VFS.ArrayDirectory.list_commit_uris.sum': 0.0854544,
            'Context.StorageManager.VFS.ArrayDirectory.list_fragment_meta_uris.avg': 0.0844494,
            'Context.StorageManager.VFS.ArrayDirectory.list_fragment_meta_uris.sum': 0.0844494,
            'Context.StorageManager.VFS.ArrayDirectory.list_root_uris.avg': 0.0843779,
            'Context.StorageManager.VFS.ArrayDirectory.list_root_uris.sum': 0.0843779,
            'Context.StorageManager.VFS.ArrayDirectory.load_consolidated_commit_uris.avg': 1.519e-06,
            'Context.StorageManager.VFS.ArrayDirectory.load_consolidated_commit_uris.sum': 1.519e-06,
            'Context.StorageManager.array_open_READ.avg': 0.150774,
            'Context.StorageManager.array_open_READ.sum': 0.150774,
            'Context.StorageManager.array_open_read_load_directory.avg': 0.0864629,
            'Context.StorageManager.array_open_read_load_directory.sum': 0.0864629,
            'Context.StorageManager.array_open_read_load_schemas_and_fragment_meta.avg': 0.0643007,
            'Context.StorageManager.array_open_read_load_schemas_and_fragment_meta.sum': 0.0643007,
            'Context.StorageManager.sm_load_all_array_schemas.avg': 0.0209265,
            'Context.StorageManager.sm_load_all_array_schemas.sum': 0.0209265,
            'Context.StorageManager.sm_load_array_metadata.avg': 7.497e-06,
            'Context.StorageManager.sm_load_array_metadata.sum': 7.497e-06,
            'Context.StorageManager.sm_load_array_schema_from_uri.avg': 0.0209128,
            'Context.StorageManager.sm_load_array_schema_from_uri.sum': 0.0209128,
            'Context.StorageManager.sm_load_array_schemas_and_fragment_metadata.avg': 0.0642973,
            'Context.StorageManager.sm_load_array_schemas_and_fragment_metadata.sum': 0.0642973,
            'Context.StorageManager.sm_load_filtered_fragment_uris.avg': 4.1077e-05,
            'Context.StorageManager.sm_load_filtered_fragment_uris.sum': 4.1077e-05,
            'Context.StorageManager.sm_load_fragment_metadata.avg': 0.0433201,
            'Context.StorageManager.sm_load_fragment_metadata.sum': 0.0433201}}

Third run takes 4.8 seconds

$ time python3 profile_single_query_s3.py --data-location s3 --profile-type tiledb-stats

real    0m4.850s
user    0m2.000s
sys 0m1.420s
{'counters': {'Context.StorageManager.Query.Reader.attr_fixed_num': 3,
              'Context.StorageManager.Query.Reader.attr_num': 5,
              'Context.StorageManager.Query.Reader.attr_var_num': 2,
              'Context.StorageManager.Query.Reader.cell_num': 910000,
              'Context.StorageManager.Query.Reader.dim_num': 2,
              'Context.StorageManager.Query.Reader.dim_var_num': 2,
              'Context.StorageManager.Query.Reader.ignored_tiles': 0,
              'Context.StorageManager.Query.Reader.internal_loop_num': 1,
              'Context.StorageManager.Query.Reader.loop_num': 1,
              'Context.StorageManager.Query.Reader.num_tiles_read': 728,
              'Context.StorageManager.Query.Reader.read_unfiltered_byte_num': 73981040,
              'Context.StorageManager.Query.Reader.result_num': 419298,
              'Context.StorageManager.VFS.file_size_num': 1,
              'Context.StorageManager.VFS.read_byte_num': 705308023,
              'Context.StorageManager.VFS.read_ops_num': 126,
              'Context.StorageManager.read_array_meta_size': 0,
              'Context.StorageManager.read_array_schema_size': 960,
              'Context.StorageManager.read_frag_meta_size': 1584,
              'Context.StorageManager.read_rtree_size': 3244564,
              'Context.StorageManager.read_tile_offsets_size': 1388224,
              'Context.StorageManager.read_tile_var_offsets_size': 867640,
              'Context.StorageManager.read_tile_var_sizes_size': 867640,
              'Context.StorageManager.read_unfiltered_byte_num': 6369028},
 'python': '{"py.buffer_conversion_time": 0.00042366981506347656, '
           '"py.core_read_query_initial_submit_time": 4.141099672, '
           '"py.core_read_query_total_time": 4.141520283, "py.getitem_time": '
           '0.0017459392547607422, "py.getitem_time.add_ranges": '
           '0.00015592575073242188, "py.pandas_index_update_time": '
           '5.602836608886719e-05, "py.query_retries_count": 0.0}',
 'timers': {'Context.StorageManager.Query.Reader.apply_query_condition.avg': 2.18e-07,
            'Context.StorageManager.Query.Reader.apply_query_condition.sum': 2.18e-07,
            'Context.StorageManager.Query.Reader.compute_results_count_sparse.avg': 8.2695e-05,
            'Context.StorageManager.Query.Reader.compute_results_count_sparse.sum': 0.00727716,
            'Context.StorageManager.Query.Reader.compute_tile_bitmaps.avg': 0.00271688,
            'Context.StorageManager.Query.Reader.compute_tile_bitmaps.sum': 0.00271688,
            'Context.StorageManager.Query.Reader.copy_fixed_data_tiles.avg': 0.00113459,
            'Context.StorageManager.Query.Reader.copy_fixed_data_tiles.sum': 0.00340378,
            'Context.StorageManager.Query.Reader.copy_offsets_tiles.avg': 0.00235719,
            'Context.StorageManager.Query.Reader.copy_offsets_tiles.sum': 0.00942877,
            'Context.StorageManager.Query.Reader.copy_var_tiles.avg': 0.00130936,
            'Context.StorageManager.Query.Reader.copy_var_tiles.sum': 0.00523746,
            'Context.StorageManager.Query.Reader.create_result_tiles.avg': 0.000160009,
            'Context.StorageManager.Query.Reader.create_result_tiles.sum': 0.000160009,
            'Context.StorageManager.Query.Reader.dowork.avg': 4.14086,
            'Context.StorageManager.Query.Reader.dowork.sum': 4.14086,
            'Context.StorageManager.Query.Reader.load_initial_data.avg': 0.0516287,
            'Context.StorageManager.Query.Reader.load_initial_data.sum': 0.0516287,
            'Context.StorageManager.Query.Reader.load_tile_offsets.avg': 0.0648514,
            'Context.StorageManager.Query.Reader.load_tile_offsets.sum': 0.194554,
            'Context.StorageManager.Query.Reader.load_tile_var_sizes.avg': 0.0574084,
            'Context.StorageManager.Query.Reader.load_tile_var_sizes.sum': 0.0574084,
            'Context.StorageManager.Query.Reader.process_tiles.avg': 3.56619,
            'Context.StorageManager.Query.Reader.process_tiles.sum': 3.56619,
            'Context.StorageManager.Query.Reader.read_and_unfilter_attributes.avg': 3.54313,
            'Context.StorageManager.Query.Reader.read_and_unfilter_attributes.sum': 3.54313,
            'Context.StorageManager.Query.Reader.read_and_unfilter_coords.avg': 0.267875,
            'Context.StorageManager.Query.Reader.read_and_unfilter_coords.sum': 0.267875,
            'Context.StorageManager.Query.Reader.read_attribute_tiles.avg': 1.74258,
            'Context.StorageManager.Query.Reader.read_attribute_tiles.sum': 3.48515,
            'Context.StorageManager.Query.Reader.read_coordinate_tiles.avg': 0.123706,
            'Context.StorageManager.Query.Reader.read_coordinate_tiles.sum': 0.247412,
            'Context.StorageManager.Query.Reader.read_tiles.avg': 0.933138,
            'Context.StorageManager.Query.Reader.read_tiles.sum': 3.73255,
            'Context.StorageManager.Query.Reader.resize_fixed_results_to_copy.avg': 4.769e-06,
            'Context.StorageManager.Query.Reader.resize_fixed_results_to_copy.sum': 4.769e-06,
            'Context.StorageManager.Query.Reader.switch_sizes_to_offsets.avg': 0.000255998,
            'Context.StorageManager.Query.Reader.switch_sizes_to_offsets.sum': 0.00102399,
            'Context.StorageManager.Query.Reader.tile_offset_sizes.avg': 6.61e-06,
            'Context.StorageManager.Query.Reader.tile_offset_sizes.sum': 6.61e-06,
            'Context.StorageManager.Query.Reader.unfilter_attr_tiles.avg': 0.00649242,
            'Context.StorageManager.Query.Reader.unfilter_attr_tiles.sum': 0.0324621,
            'Context.StorageManager.Query.Reader.unfilter_coord_tiles.avg': 0.00402229,
            'Context.StorageManager.Query.Reader.unfilter_coord_tiles.sum': 0.0201115,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_meta_uris.avg': 0.0708566,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_meta_uris.sum': 0.0708566,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_schema_uris.avg': 0.0766091,
            'Context.StorageManager.VFS.ArrayDirectory.list_array_schema_uris.sum': 0.0766091,
            'Context.StorageManager.VFS.ArrayDirectory.list_commit_uris.avg': 0.0768642,
            'Context.StorageManager.VFS.ArrayDirectory.list_commit_uris.sum': 0.0768642,
            'Context.StorageManager.VFS.ArrayDirectory.list_fragment_meta_uris.avg': 0.0742092,
            'Context.StorageManager.VFS.ArrayDirectory.list_fragment_meta_uris.sum': 0.0742092,
            'Context.StorageManager.VFS.ArrayDirectory.list_root_uris.avg': 0.0906188,
            'Context.StorageManager.VFS.ArrayDirectory.list_root_uris.sum': 0.0906188,
            'Context.StorageManager.VFS.ArrayDirectory.load_consolidated_commit_uris.avg': 2.198e-06,
            'Context.StorageManager.VFS.ArrayDirectory.load_consolidated_commit_uris.sum': 2.198e-06,
            'Context.StorageManager.array_open_READ.avg': 0.161641,
            'Context.StorageManager.array_open_READ.sum': 0.161641,
            'Context.StorageManager.array_open_read_load_directory.avg': 0.0908443,
            'Context.StorageManager.array_open_read_load_directory.sum': 0.0908443,
            'Context.StorageManager.array_open_read_load_schemas_and_fragment_meta.avg': 0.0707864,
            'Context.StorageManager.array_open_read_load_schemas_and_fragment_meta.sum': 0.0707864,
            'Context.StorageManager.sm_load_all_array_schemas.avg': 0.0227951,
            'Context.StorageManager.sm_load_all_array_schemas.sum': 0.0227951,
            'Context.StorageManager.sm_load_array_metadata.avg': 7.044e-06,
            'Context.StorageManager.sm_load_array_metadata.sum': 7.044e-06,
            'Context.StorageManager.sm_load_array_schema_from_uri.avg': 0.0227812,
            'Context.StorageManager.sm_load_array_schema_from_uri.sum': 0.0227812,
            'Context.StorageManager.sm_load_array_schemas_and_fragment_metadata.avg': 0.0707832,
            'Context.StorageManager.sm_load_array_schemas_and_fragment_metadata.sum': 0.0707832,
            'Context.StorageManager.sm_load_filtered_fragment_uris.avg': 4.8789e-05,
            'Context.StorageManager.sm_load_filtered_fragment_uris.sum': 4.8789e-05,
            'Context.StorageManager.sm_load_fragment_metadata.avg': 0.0479296,
            'Context.StorageManager.sm_load_fragment_metadata.sum': 0.0479296}}
prathapsridharan commented 9 months ago

@atarashansky @dsadgat @atolopko-czi

Based on the investigation above, we are reasonably confident that moving the snapshot to the web application's local disk during the deployment/initialization will significantly reduce the tiledb query times and give us sub-second query times for our most expensive queries.

The reason for this is because S3 is not a dedicated resource like the local disk. S3 data retrieval times depend on various factors such as caching policy, load on the S3 service, network load, etc. The tiledb stats above seem to indicate that a vast amount of the query time is spent on I/O (see Context.StorageManager.Query.Reader.read_* times) and this includes network I/O. We have good reason to suspect network I/O is the bottleneck when reading the data from S3 and that bottleneck is significant - local disk reads are 5-10X faster than reads from S3.

It has also been suggested that we take a look at the high performance S3 one zone storage class (thanks @bkmartinjr ) which purports to deliver "consistent single-digit millisecond data access for your most frequently accessed data and latency-sensitive applications. S3 Express One Zone can improve data access speeds by 10x and reduce request costs by 50% compared to S3 Standard and scales to process millions of requests per minute."

There is another option which seems even more promising - Mount Point for Amazon S3 suggested by @atolopko-czi. Per the documentation: Mountpoint for Amazon S3 can now cache data in Amazon EC2 instance storage, instance memory, or an Amazon EBS volume for repeated data access. This avoids redundant requests to Amazon S3 when you need to read the same data multiple times, which improves both the cost and performance of your application. For example, by using Mountpoint for Amazon S3 and caching data in Amazon EC2 instance storage, you can complete your machine learning training jobs up to 2x faster by avoiding redundant requests to Amazon S3.

If the Mount Point for S3 indeed brings query latencies to acceptable levels, then that might be a faster to implement and less error prone than copying the snapshot to local disk.

There is also pending investigation on using postgres as the data store for WMG cube. Initial investigations by @atolopko-czi look promising for all of GE query patterns. Particularly it shows the promise in reducing rollup operation from 8-10 seconds to <1 second (a 10X improvement). But we are ways away from committing to it.

Next Action Items

  1. Investigate Mount Point for Amazon S3

Decide if Mount Point obviates the need for copying tiledb data files to local disk. high performance S3 one zone storage class is not included in the list of candidate solutions to investigate because Mount Point seems like a logically better solution.

prathapsridharan commented 9 months ago

Analysis notes comparing tiledb query performance against tiledb data files in mount-s3, s3, and local disk

Single hardcoded query runtime comparison for LOCAL DISK, MOUNT-S3, and S3

NOTE: query is for 50 genes over ALL tissues

# S3
$ python profile_single_query_s3.py --data-location s3 --profile-type simple-wallclock
Query Time: 6.748166222999998

# MOUNT-S3
$ python profile_single_query_s3.py --data-location mount-s3 --profile-type simple-wallclock
Query Time: 1.5921803740000087

# LOCAL DISK
$ python profile_single_query_s3.py --data-location local --profile-type simple-wallclock
Query Time: 0.6546747279999181

100 runs of randomly generated queries against MOUNT-S3 tiledb data files

NOTE: query is for 50 genes over ALL tissues

Caching is enabled when mount-s3 is configured:

$ mount-s3 cellxgene-wmg-staging wmg-cube-mount-s3 --cache cache-wmg-cube-mount-s3 --metadata-ttl 300
$ python profile_expr_summary.py --data-location mount-s3 --profile-type simple-wallclock --profile-scope raw-query --num-trials 100 --random True

#####Simple Wallclock Profiling RAW QUERY#####
num_trials: 100, num_genes_per_trial: 50, randomize_queries: True, wait_time: 0.1

total_query_time: 84.93799073400032, avg_query_time: 0.8493799073400032

trial 0: query_time: 4.458109610000065
trial 1: query_time: 2.5819802759999675
trial 2: query_time: 1.654652904000045
trial 3: query_time: 1.3077535179999131
trial 4: query_time: 1.9769351569999571
trial 5: query_time: 1.1004632560000118
trial 6: query_time: 1.3345652950000613
trial 7: query_time: 1.7758604320000586
trial 8: query_time: 0.7463126810000631
trial 9: query_time: 1.1715542599999935
trial 10: query_time: 1.2971961010000541
trial 11: query_time: 1.499866001999976
trial 12: query_time: 0.7727857449999647
trial 13: query_time: 0.6348842930000274
trial 14: query_time: 1.1153419169999097
trial 15: query_time: 0.9048437559999911
trial 16: query_time: 0.7109619850000399
trial 17: query_time: 0.894831138000086
trial 18: query_time: 1.188690336000036
trial 19: query_time: 0.8177241539999613
trial 20: query_time: 0.6851584220000859
trial 21: query_time: 0.5988486710000416
trial 22: query_time: 0.5904493919999823
trial 23: query_time: 0.5490066790000583
trial 24: query_time: 1.0126316700000189
trial 25: query_time: 0.8298233699999855
trial 26: query_time: 0.5345813519999183
trial 27: query_time: 0.662106289999997
trial 28: query_time: 0.5923850259999881
trial 29: query_time: 0.6814443660000506
trial 30: query_time: 0.8294042369999488
trial 31: query_time: 0.5753452720000496
trial 32: query_time: 0.7977175329999682
trial 33: query_time: 0.6071753160000526
trial 34: query_time: 0.6226004379999495
trial 35: query_time: 0.6575300069999912
trial 36: query_time: 0.7002456540000139
trial 37: query_time: 1.285965970999996
trial 38: query_time: 1.1663920839999946
trial 39: query_time: 0.7513186669999641
trial 40: query_time: 0.656368059999977
trial 41: query_time: 0.7102322559999266
trial 42: query_time: 0.5516998939999667
trial 43: query_time: 0.830023180000012
trial 44: query_time: 0.8122672350000357
trial 45: query_time: 0.630903529999955
trial 46: query_time: 1.4243341730000338
trial 47: query_time: 1.1057274869999674
trial 48: query_time: 0.5985588050000388
trial 49: query_time: 0.9679796819999638
trial 50: query_time: 1.025151289000064
trial 51: query_time: 0.7352055889999747
trial 52: query_time: 0.6162365740000268
trial 53: query_time: 0.5753514220000397
trial 54: query_time: 0.7383069479999449
trial 55: query_time: 0.8410589269999491
trial 56: query_time: 0.9383649459999788
trial 57: query_time: 0.897521561000076
trial 58: query_time: 1.0693073559999675
trial 59: query_time: 0.5501075309999806
trial 60: query_time: 0.5846752400000241
trial 61: query_time: 0.7247286850000592
trial 62: query_time: 0.7725954569999658
trial 63: query_time: 0.7867966049999495
trial 64: query_time: 0.5920192209999868
trial 65: query_time: 0.8036754490000249
trial 66: query_time: 0.5323291160000281
trial 67: query_time: 0.6382877339999595
trial 68: query_time: 0.5746756229999619
trial 69: query_time: 0.6052720100000215
trial 70: query_time: 0.5027580239999452
trial 71: query_time: 0.711808700000006
trial 72: query_time: 0.7767928340000481
trial 73: query_time: 0.6472635080000373
trial 74: query_time: 0.9157147470000382
trial 75: query_time: 0.7538526939999883
trial 76: query_time: 0.688559142000031
trial 77: query_time: 0.6730260580000049
trial 78: query_time: 0.8670512950000102
trial 79: query_time: 0.522248191000017
trial 80: query_time: 0.6768851660000337
trial 81: query_time: 0.5359373399999185
trial 82: query_time: 0.621782244999963
trial 83: query_time: 0.628841972000032
trial 84: query_time: 0.6184606140000142
trial 85: query_time: 0.8015787379999892
trial 86: query_time: 0.587429942999961
trial 87: query_time: 0.5087339709999696
trial 88: query_time: 0.5376343399999541
trial 89: query_time: 0.5914901419999978
trial 90: query_time: 0.6029010980000749
trial 91: query_time: 0.6341135149999673
trial 92: query_time: 0.6610181510000075
trial 93: query_time: 0.4858367790000102
trial 94: query_time: 0.5385020109999914
trial 95: query_time: 0.5957841180000969
trial 96: query_time: 0.52943291400004
trial 97: query_time: 0.8288854480000509
trial 98: query_time: 0.7310777010000038
trial 99: query_time: 0.5973845170000232

100 runs of randomly generated queries against LOCAL DISK tiledb data files

NOTE: query is for 50 genes over ALL tissues

$ python profile_expr_summary.py --data-location local --profile-type simple-wallclock --profile-scope raw-query --num-trials 100 --random True

#####Simple Wallclock Profiling RAW QUERY#####
num_trials: 100, num_genes_per_trial: 50, randomize_queries: True, wait_time: 0.1

total_query_time: 32.65206927299994, avg_query_time: 0.3265206927299994

trial 0: query_time: 1.4072613220000108
trial 1: query_time: 2.3759740790000023
trial 2: query_time: 0.5368550270000014
trial 3: query_time: 0.9578235669999913
trial 4: query_time: 0.28350593100000765
trial 5: query_time: 0.30751738200001455
trial 6: query_time: 0.31380537399999753
trial 7: query_time: 0.32840677999999457
trial 8: query_time: 0.29934287099999324
trial 9: query_time: 0.22024661400001833
trial 10: query_time: 0.31282072999999855
trial 11: query_time: 0.29767627600000424
trial 12: query_time: 0.2902254990000017
trial 13: query_time: 0.35103022499998815
trial 14: query_time: 0.3035185159999969
trial 15: query_time: 0.31603130199999896
trial 16: query_time: 0.24937240499997415
trial 17: query_time: 0.26602102399999694
trial 18: query_time: 0.3001532619999807
trial 19: query_time: 0.3459727100000123
trial 20: query_time: 0.21078897999998958
trial 21: query_time: 0.3082457399999896
trial 22: query_time: 0.28248107899997876
trial 23: query_time: 0.3546332549999818
trial 24: query_time: 0.279284964999988
trial 25: query_time: 0.17006857799998443
trial 26: query_time: 0.23051406100000804
trial 27: query_time: 0.32190842399998587
trial 28: query_time: 0.2723682710000048
trial 29: query_time: 0.283731854999985
trial 30: query_time: 0.3090172530000075
trial 31: query_time: 0.31138136100000224
trial 32: query_time: 0.2748879780000095
trial 33: query_time: 0.32576917999998045
trial 34: query_time: 0.18598574300000337
trial 35: query_time: 0.29140421600001787
trial 36: query_time: 0.23635345899998583
trial 37: query_time: 0.37989791000001105
trial 38: query_time: 0.28009902400000897
trial 39: query_time: 0.26255842000000484
trial 40: query_time: 0.27622637199999645
trial 41: query_time: 0.22819967400002383
trial 42: query_time: 0.31207555200001025
trial 43: query_time: 0.2586563610000212
trial 44: query_time: 0.3080421359999832
trial 45: query_time: 0.2868817690000185
trial 46: query_time: 0.2384845359999872
trial 47: query_time: 0.2630379590000018
trial 48: query_time: 0.2690308609999761
trial 49: query_time: 0.20165372400001047
trial 50: query_time: 0.28473853200000576
trial 51: query_time: 0.25942350999997643
trial 52: query_time: 0.3189702759999875
trial 53: query_time: 0.2464530340000124
trial 54: query_time: 0.2984945339999854
trial 55: query_time: 0.3433219669999801
trial 56: query_time: 0.30183743500001015
trial 57: query_time: 0.22690623099998675
trial 58: query_time: 0.34041376700000114
trial 59: query_time: 0.22151536000001215
trial 60: query_time: 0.2960845069999891
trial 61: query_time: 0.22988944299999048
trial 62: query_time: 0.25595664900001225
trial 63: query_time: 0.3659249149999937
trial 64: query_time: 0.2777073929999858
trial 65: query_time: 0.30865272500000174
trial 66: query_time: 0.37089284099999986
trial 67: query_time: 0.2557718740000041
trial 68: query_time: 0.27712515899997925
trial 69: query_time: 0.26933869500001606
trial 70: query_time: 0.21371085300000914
trial 71: query_time: 0.2876549640000121
trial 72: query_time: 0.3173794589999943
trial 73: query_time: 0.2855394560000093
trial 74: query_time: 0.2950028239999938
trial 75: query_time: 0.2466388489999929
trial 76: query_time: 0.2891999789999886
trial 77: query_time: 0.35536709699999847
trial 78: query_time: 0.32608064799998715
trial 79: query_time: 0.37119952800000533
trial 80: query_time: 0.25508051799999976
trial 81: query_time: 0.3064294440000026
trial 82: query_time: 0.2668080850000081
trial 83: query_time: 0.2993090690000031
trial 84: query_time: 0.24298014100000387
trial 85: query_time: 0.25985804500001564
trial 86: query_time: 0.2653288660000044
trial 87: query_time: 0.3573647000000051
trial 88: query_time: 0.26351055300000326
trial 89: query_time: 0.36487833000001046
trial 90: query_time: 0.3459658890000128
trial 91: query_time: 0.22377544799999782
trial 92: query_time: 0.2584887149999986
trial 93: query_time: 0.33219296099997564
trial 94: query_time: 0.25037413499998706
trial 95: query_time: 0.2965343940000196
trial 96: query_time: 0.25898273100000324
trial 97: query_time: 0.25781055499999184
trial 98: query_time: 0.25261749200001304
trial 99: query_time: 0.2793571060000204

100 runs of randomly generated queries against S3 tiledb data files

NOTE: query is for 50 genes over ALL tissues

$ python profile_expr_summary.py --data-location s3 --profile-type simple-wallclock --profile-scope raw-query --num-trials 100 --random True

#####Simple Wallclock Profiling RAW QUERY#####
num_trials: 100, num_genes_per_trial: 50, randomize_queries: True, wait_time: 0.1

total_query_time: 135.46871183299987, avg_query_time: 1.3546871183299987

trial 0: query_time: 4.003985972999999
trial 1: query_time: 3.573740692000001
trial 2: query_time: 2.509278213000016
trial 3: query_time: 2.9792625430000044
trial 4: query_time: 3.0536960640000075
trial 5: query_time: 1.9944726909999986
trial 6: query_time: 1.0862587259999827
trial 7: query_time: 1.3573313630000143
trial 8: query_time: 1.9742172940000273
trial 9: query_time: 1.6433746779999865
trial 10: query_time: 1.4304261359999941
trial 11: query_time: 1.9637260859999515
trial 12: query_time: 1.6831383580000079
trial 13: query_time: 1.606910973999959
trial 14: query_time: 0.9548125299999697
trial 15: query_time: 0.9560846659999811
trial 16: query_time: 1.2880271770000036
trial 17: query_time: 1.0503580980000038
trial 18: query_time: 1.1362863399999696
trial 19: query_time: 1.3571949529999756
trial 20: query_time: 1.871374163999974
trial 21: query_time: 1.2705959780000171
trial 22: query_time: 1.1581272490000174
trial 23: query_time: 1.5232975200000283
trial 24: query_time: 1.7971981979999896
trial 25: query_time: 1.6522054189999835
trial 26: query_time: 1.819237438000016
trial 27: query_time: 1.572304264999957
trial 28: query_time: 1.9447415980000073
trial 29: query_time: 1.2144866610000236
trial 30: query_time: 0.9088127830000303
trial 31: query_time: 1.4932521650000012
trial 32: query_time: 0.9466949410000325
trial 33: query_time: 1.5894347000000266
trial 34: query_time: 1.1557056170000237
trial 35: query_time: 1.0891182670000035
trial 36: query_time: 1.1767906669999775
trial 37: query_time: 1.1259856920000288
trial 38: query_time: 1.6331904759999816
trial 39: query_time: 1.028848326000002
trial 40: query_time: 1.074094936999984
trial 41: query_time: 0.8739587820000452
trial 42: query_time: 1.6000312780000172
trial 43: query_time: 1.3146131400000058
trial 44: query_time: 0.9708979579999664
trial 45: query_time: 1.0968067150000138
trial 46: query_time: 1.0503926040000238
trial 47: query_time: 1.2663111580000077
trial 48: query_time: 1.4521793879999905
trial 49: query_time: 1.004449396000041
trial 50: query_time: 1.231613901000003
trial 51: query_time: 0.8825964750000139
trial 52: query_time: 1.1230223580000143
trial 53: query_time: 1.026162338000006
trial 54: query_time: 1.2339305140000079
trial 55: query_time: 1.4468387879999796
trial 56: query_time: 0.8095109519999824
trial 57: query_time: 1.0504910480000262
trial 58: query_time: 1.5855452669999863
trial 59: query_time: 0.9767604419999998
trial 60: query_time: 0.9815348839999842
trial 61: query_time: 1.4171188699999675
trial 62: query_time: 1.158444870999972
trial 63: query_time: 1.2652696480000145
trial 64: query_time: 1.035514126999999
trial 65: query_time: 1.1793643579999298
trial 66: query_time: 1.2436490709999362
trial 67: query_time: 1.214567666999983
trial 68: query_time: 1.2286974169999212
trial 69: query_time: 0.9482812449999756
trial 70: query_time: 1.3685508480000408
trial 71: query_time: 0.9726102689999152
trial 72: query_time: 1.245883790999983
trial 73: query_time: 1.278040676000046
trial 74: query_time: 1.204819663999956
trial 75: query_time: 1.6147211539999944
trial 76: query_time: 1.137744022999982
trial 77: query_time: 1.1147740330000033
trial 78: query_time: 1.299879447999956
trial 79: query_time: 1.057496297000057
trial 80: query_time: 1.2484915649999948
trial 81: query_time: 1.0604711710000174
trial 82: query_time: 1.1205998530000443
trial 83: query_time: 1.4988737979999769
trial 84: query_time: 1.2238189800000328
trial 85: query_time: 1.2590955040000154
trial 86: query_time: 1.3017927930000042
trial 87: query_time: 1.212496326000064
trial 88: query_time: 1.2545615830000543
trial 89: query_time: 1.1321148170000015
trial 90: query_time: 1.0944500770000332
trial 91: query_time: 1.2451703630000566
trial 92: query_time: 1.2014130669999759
trial 93: query_time: 1.136929820999967
trial 94: query_time: 0.9320932310000671
trial 95: query_time: 0.8415176459999429
trial 96: query_time: 0.9394534210000529
trial 97: query_time: 0.9467821939999794
trial 98: query_time: 1.2511406579999402
trial 99: query_time: 0.956289485999946