dchaley / deepcell-imaging

Tools & guidance to scale DeepCell imaging on Google Cloud Batch
7 stars 2 forks source link

Move benchmark data to bucket + add final step to record it #238

Closed dchaley closed 3 weeks ago

dchaley commented 1 month ago

We used to be able to record benchmarking measurements easily because everything was happening in one process.

With the processes split out after #222, the processes are by design independent… so this doesn't work anymore.

Instead: write benchmarking results to Cloud Storage as we go, and add a final step to pull them in & record to the benchmarking table.

dchaley commented 1 month ago

Dividing the list of benchmarking columns into the phases:

Columns for whole job

"input_file_id"
"numpy_size_mb"
"pixels_m"
"compartment"
"benchmark_datetime_utc"
"success"
"deepcell_tf_version"

Columns for each phase

"instance_type"
"gpu_type"
"num_gpus"
"success"
"peak_memory_gb"
"provisioning_model"
"input_load_time_s"
"time_s"
"output_write_time_s"

Columns for preprocessing

Columns for prediction

"model_load_time_s"
"batch_size"

Columns for postprocessing

dchaley commented 1 month ago

The resulting BigQuery schema:

input_file_id:STRING,
numpy_size_mb:FLOAT,
pixels_m:INTEGER,
compartment:STRING,
benchmark_datetime_utc:DATETIME,
success:BOOLEAN,
cloud_region:STRING,

preprocessing_instance_type:STRING,
preprocessing_gpu_type:STRING,
preprocessing_num_gpus:INTEGER,
preprocessing_success:BOOLEAN,
preprocessing_peak_memory_gb:FLOAT,
preprocessing_is_preemptible:BOOLEAN,
preprocessing_input_load_time_s:FLOAT,
preprocessing_time_s:FLOAT,
preprocessing_output_write_time_s:FLOAT,

prediction_instance_type:STRING,
prediction_gpu_type:STRING,
prediction_num_gpus:INTEGER,
prediction_success:BOOLEAN,
prediction_peak_memory_gb:FLOAT,
prediction_is_preemptible:BOOLEAN,
prediction_input_load_time_s:FLOAT,
prediction_time_s:FLOAT,
prediction_output_write_time_s:FLOAT,
prediction_model_load_time_s:FLOAT,
prediction_batch_size:INTEGER,

postprocessing_instance_type:STRING,
postprocessing_gpu_type:STRING,
postprocessing_num_gpus:INTEGER,
postprocessing_success:BOOLEAN,
postprocessing_peak_memory_gb:FLOAT,
postprocessing_is_preemptible:BOOLEAN,
postprocessing_input_load_time_s:FLOAT,
postprocessing_time_s:FLOAT,
postprocessing_output_write_time_s:FLOAT
dchaley commented 3 weeks ago

This is done. We ran the steps independently, and gathered the results for upload to BigQuery.

dchaley commented 2 weeks ago

The final schema that we used:

[{"name":"input_file_id","type":"STRING","mode":"NULLABLE"},{"name":"numpy_size_mb","type":"FLOAT","mode":"NULLABLE"},{"name":"pixels_m","type":"INTEGER","mode":"NULLABLE"},{"name":"compartment","type":"STRING","mode":"NULLABLE"},{"name":"benchmark_datetime_utc","type":"DATETIME","mode":"NULLABLE"},{"name":"success","type":"BOOLEAN","mode":"NULLABLE"},{"name":"cloud_region","type":"STRING","mode":"NULLABLE"},{"name":"preprocessing_instance_type","type":"STRING","mode":"NULLABLE"},{"name":"preprocessing_gpu_type","type":"STRING","mode":"NULLABLE"},{"name":"preprocessing_num_gpus","type":"INTEGER","mode":"NULLABLE"},{"name":"preprocessing_success","type":"BOOLEAN","mode":"NULLABLE"},{"name":"preprocessing_peak_memory_gb","type":"FLOAT","mode":"NULLABLE"},{"name":"preprocessing_is_preemptible","type":"BOOLEAN","mode":"NULLABLE"},{"name":"preprocessing_input_load_time_s","type":"FLOAT","mode":"NULLABLE"},{"name":"preprocessing_time_s","type":"FLOAT","mode":"NULLABLE"},{"name":"preprocessing_output_write_time_s","type":"FLOAT","mode":"NULLABLE"},{"name":"prediction_instance_type","type":"STRING","mode":"NULLABLE"},{"name":"prediction_gpu_type","type":"STRING","mode":"NULLABLE"},{"name":"prediction_num_gpus","type":"INTEGER","mode":"NULLABLE"},{"name":"prediction_success","type":"BOOLEAN","mode":"NULLABLE"},{"name":"prediction_peak_memory_gb","type":"FLOAT","mode":"NULLABLE"},{"name":"prediction_is_preemptible","type":"BOOLEAN","mode":"NULLABLE"},{"name":"prediction_input_load_time_s","type":"FLOAT","mode":"NULLABLE"},{"name":"prediction_time_s","type":"FLOAT","mode":"NULLABLE"},{"name":"prediction_output_write_time_s","type":"FLOAT","mode":"NULLABLE"},{"name":"prediction_model_load_time_s","type":"FLOAT","mode":"NULLABLE"},{"name":"prediction_batch_size","type":"INTEGER","mode":"NULLABLE"},{"name":"postprocessing_instance_type","type":"STRING","mode":"NULLABLE"},{"name":"postprocessing_gpu_type","type":"STRING","mode":"NULLABLE"},{"name":"postprocessing_num_gpus","type":"INTEGER","mode":"NULLABLE"},{"name":"postprocessing_success","type":"BOOLEAN","mode":"NULLABLE"},{"name":"postprocessing_peak_memory_gb","type":"FLOAT","mode":"NULLABLE"},{"name":"postprocessing_is_preemptible","type":"BOOLEAN","mode":"NULLABLE"},{"name":"postprocessing_input_load_time_s","type":"FLOAT","mode":"NULLABLE"},{"name":"postprocessing_time_s","type":"FLOAT","mode":"NULLABLE"},{"name":"postprocessing_output_write_time_s","type":"FLOAT","mode":"NULLABLE"}]

cc @WeihaoGe1009