Closed achave11-ucsc closed 3 months ago
Assignee to provide complete reproduction in the description of this issue.
Spike for estimate and optionally design.
The actual counting of subgraphs occurs in the RepositoryPlugin.list_partitions
method. The current implementation of this method is egregiously inefficient due to a JOIN between every bundle in the snapshot and every possible prefix. For the failing snapshot, that's a total of 20,314,783,744 string comparisons.
This appears to be more than the BQ servers can handle. I couldn't find any more evidence as to the underlying cause of the error, but I re-implemented list_partitions
to avoid this JOIN and the error now appears to be resolved.
The estimate for the implementation is covered by the spike.
I've confirmed that the script's output has not changed as a result of the re-implementation.
For demo, attempt to reproduce.
… when the
hammerbox
deployment is selected.In order to reproduce, apply the following patch to add the
ANVIL_T2T_CHRY_20240301_ANV5_202403040508
dataset…… then run the following command with
hammerbox
selected:The following is an excerpt of what that failed execution looks like (dropping the successful graph counts minus the last, for reference),