cowprotocol / dune-bridge

Other
4 stars 3 forks source link

[Dune Sync] Missing Values Handler #47

Closed bh2smith closed 1 year ago

bh2smith commented 1 year ago

This PR adds a class RecordHandler which, given dune_results and missing_values.

processes and filters these records appropriately.

The above two steps are implemented as private methods internal to the record handler, and the public method is responsible for calling these in the correct order.

Once the GIVE_UP_THRESHOLD is reached we assume no content exists and write an empty dict.

I have run this twice now (based on yesterdays run)

Note that this seconds run is nice to see we attempted to process 39 missing missing records and 2 new records. The entire run took 2m9s

Below is the output and logs of the run.

Before data.before.zip After data.after.zip

Logs From Second Run ``` 2022-11-18 16:16:04,408 DEBUG pysrc.fetch.dune Executing Query(query_id=1615490, name='Latest Possible App Hash Block', params=None) 2022-11-18 16:16:04,685 INFO dune_client.base_client waiting for query execution 01GJ5MZA9S61F5211B490BX0XA to complete: ExecutionState.EXECUTING 2022-11-18 16:16:14,838 INFO dune_client.base_client waiting for query execution 01GJ5MZA9S61F5211B490BX0XA to complete: ExecutionState.EXECUTING 2022-11-18 16:16:24,985 INFO dune_client.base_client waiting for query execution 01GJ5MZA9S61F5211B490BX0XA to complete: ExecutionState.EXECUTING 2022-11-18 16:16:35,280 DEBUG pysrc.fetch.dune Got 1 results for execution 01GJ5MZA9S61F5211B490BX0XA 2022-11-18 16:16:35,280 DEBUG pysrc.fetch.dune Executing Query(query_id=1610025, name='Unique App Hashes', params=[Parameter(name=BlockFrom, value=15989823, type=number), Parameter(name=BlockTo, value=15997719, type=number)]) 2022-11-18 16:16:35,553 INFO dune_client.base_client waiting for query execution 01GJ5N08E6QHDQFYW4HS14NSXM to complete: ExecutionState.EXECUTING 2022-11-18 16:16:45,832 DEBUG pysrc.fetch.dune Got 2 results for execution 01GJ5N08E6QHDQFYW4HS14NSXM 2022-11-18 16:16:47,888 DEBUG __main__ Found content for 0x70c5886eefadc643c15bc1bf38ebfa22c5b709ec80c46cc92935243ffb3828e7 at CID bafybeidqyweg535nyzb4cw6bx44ox6rcyw3qt3eayrwmskjveq77wobi44 2022-11-18 16:16:49,733 DEBUG __main__ Found content for 0x8e05f6e1c16ae6a9fff42fe036c926b3e6078e053911c65a8b3aa5063fd81df8 at CID bafybeieoax3odqlk42u775bp4a3msjvt4ydy4bjzchdfvcz2uudd7wa57a 2022-11-18 16:16:49,733 INFO __main__ Attempting to recover missing 39 records from previous run 2022-11-18 16:16:57,688 DEBUG __main__ Found previously missing content hash 0x2a277872392e331c5a646ea9e53f7cc15ef35af292cb460817bc75376cd8b8e5 at CID bafybeibke54heojogmofuzdovhst67gbl3zvv4uszndaqf54ou3wzwfy4u 2022-11-18 16:17:05,501 DEBUG __main__ Found previously missing content hash 0x3ccbd83bd785e95fbc5954b9ca8b3d2234c77c178025f52f80d1e0bba0eee1f8 at CID bafybeib4zpmdxv4f5fp3ywkuxhfiwpjcgtdxyf4aex2s7agr4c52b3xb7a 2022-11-18 16:17:16,456 DEBUG __main__ Found previously missing content hash 0xbd09f849e4d886470d0d81eb3b1779a99379eb2068f1cbb3f6377101beea7806 at CID bafybeif5bh4etzgyqzdq2dmb5m5ro6njsn46widi6hf3h5rxoea352tyay 2022-11-18 16:17:20,525 DEBUG __main__ Found previously missing content hash 0xb87622cac089bf4125a906eb1622325fb825909fd002532af0545c447f6a3ab8 at CID bafybeifyoyrmvqejx5aslkig5mlcems7xaszbh6qajjsv4culrch62r2xa 2022-11-18 16:17:22,027 DEBUG __main__ Found previously missing content hash 0x397c66430225e770c6b7256fec6630116e385cccdd98dcb951ec106c754004a6 at CID bafybeibzprtegarf45ymnnzfn7wgmmarny4fztg5tdolsupmcbwhkqaeuy 2022-11-18 16:17:26,083 DEBUG __main__ Found previously missing content hash 0x6f0cca62338e841404f3f680ecab8cf51f18f24e0d64dacc5b8e37abe606a6c4 at CID bafybeidpbtfgem4oqqkaj47wqdwkxdhvd4mpetqnmtnmyw4og6v6mbvgyq 2022-11-18 16:17:27,110 DEBUG __main__ Found previously missing content hash 0x7499768d1601195574f1e6bb7daeb14da16b991e8f8c635ece536d7e72b2b015 at CID bafybeidutf3i2fqbdfkxj4pgxn625mknufvzshuprrrv5tstnv7hfmvqcu 2022-11-18 16:17:37,349 DEBUG __main__ Found previously missing content hash 0x9c34ae83741575a546562837ca47626ba4841dfd140f48725d20d16f31a531c7 at CID bafybeie4gsxig5avowsumvrig7feoytluscb37iub5ehexja2fxtdjjry4 2022-11-18 16:17:38,172 DEBUG __main__ Found previously missing content hash 0x741d4b41799d83a62dcbf0ad3020689bdd656a026a03ef1628564bf920c62875 at CID bafybeidudvfuc6m5qotc3s7qvuyca2e33vswuatkapxrmkcwjp4sbrriou 2022-11-18 16:17:45,338 DEBUG __main__ Found previously missing content hash 0xd5caa7afe5351511fb491c567793c4a45fa4c087298073d1fe350606cb3941f3 at CID bafybeigvzkt27zjvcui7wsi4kz3zhrfel6smbbzjqbz5d7rvaydmwokb6m 2022-11-18 16:17:53,220 DEBUG __main__ Found previously missing content hash 0xa076a100c2535dc6047c4c9940ae647d7deaac1729745117d19d4a63bc2f4d30 at CID bafybeifao2qqbqstlxdai7cmtfak4zd5pxvkyfzjorirpum5jjr3yl2nga 2022-11-18 16:17:57,213 DEBUG __main__ Found previously missing content hash 0x19d30e274fedcc74158b0f2dc69719aad27332fb6f5397363cb75a780eaa6368 at CID bafybeiaz2mhcot7nzr2blcypfxdjognk2jztf63pkoltmpfxlj4a5ktdna 2022-11-18 16:17:57,859 DEBUG __main__ Found previously missing content hash 0x55623ea920bf8e7ba60cfe8bfaa14b71947a7c26e443952ca7d87a8d1b0273f5 at CID bafybeicvmi7ksif7rz52mdh6rp5kcs3rsr5hyjxeiokszj6ypkgrwatt6u 2022-11-18 16:17:59,875 DEBUG __main__ Found previously missing content hash 0x348acd73d137eeafd01ea0e053290402775b32b8cbb171816261618dd0f5f809 at CID bafybeiburlgxhujx52x5ahva4bjssbaco5ntfoglwfyycytbmgg5b5pybe 2022-11-18 16:18:00,900 DEBUG __main__ Found previously missing content hash 0x66375cf21271bdb5dbe30ed291f05da274e29d1936afc08d2d1f85d8d51061d6 at CID bafybeidgg5opeetrxw25xyyo2ki7axncotrj2gjwv7ai2li7qxmnkedb2y 2022-11-18 16:18:08,740 DEBUG __main__ Found previously missing content hash 0xaa05d19d8fd8ca61bd6ec52f25bdfb1d8a1ee874117c0cc388a49e117d1d2832 at CID bafybeifkaxiz3d6yzjq323wff4s336y5ripoq5arpqgmhcfetyix2hjigi 2022-11-18 16:18:12,476 DEBUG __main__ Found previously missing content hash 0xe9b5916b8feb5056e7dc6224c4c49464057bb2aa520156f2d39614b5118587ee at CID bafybeihjwwiwxd7lkblopxdcetcmjfdeav53fkssaflpfu4wcs2rdbmh5y 2022-11-18 16:18:13,088 DEBUG __main__ Found previously missing content hash 0x1487f733547707805dc8bb4b738f97b098bd793d8065934efdf9eb45de5f45e1 at CID bafybeiauq73tgvdxa6af3sf3jnzy7f5qtc6xspmamwju57pz5nc54x2f4e ```
Logs from 4th run ``` 2022-11-18 16:28:19,704 DEBUG pysrc.fetch.dune Executing Query(query_id=1615490, name='Latest Possible App Hash Block', params=None) 2022-11-18 16:28:19,993 INFO dune_client.base_client waiting for query execution 01GJ5NNRCXMYF2SZMFS6WCEDN7 to complete: ExecutionState.EXECUTING 2022-11-18 16:28:30,306 DEBUG pysrc.fetch.dune Got 1 results for execution 01GJ5NNRCXMYF2SZMFS6WCEDN7 2022-11-18 16:28:30,307 DEBUG pysrc.fetch.dune Executing Query(query_id=1610025, name='Unique App Hashes', params=[Parameter(name=BlockFrom, value=15997760, type=number), Parameter(name=BlockTo, value=15997775, type=number)]) 2022-11-18 16:28:30,612 INFO dune_client.base_client waiting for query execution 01GJ5NP2R7B1H60AKDNVDMH46X to complete: ExecutionState.EXECUTING 2022-11-18 16:28:40,908 DEBUG pysrc.fetch.dune Got 0 results for execution 01GJ5NP2R7B1H60AKDNVDMH46X 2022-11-18 16:28:40,908 INFO __main__ Attempting to recover missing 21 records from previous run 2022-11-18 16:28:44,078 DEBUG __main__ No content found after 12 attempts for 0x0000000000000000000000000000000000000000000000000000000000000000 assuming NULL. 2022-11-18 16:28:47,220 DEBUG __main__ No content found after 12 attempts for 0x707066dbd9c12b5e8fe57ba98601b6a2ac5fedd2038f6f49ede880e5854c48a1 assuming NULL. 2022-11-18 16:28:50,355 DEBUG __main__ No content found after 12 attempts for 0x487b02c558d72800000000000000000000000000000000000000000000000000 assuming NULL. 2022-11-18 16:28:53,512 DEBUG __main__ No content found after 12 attempts for 0x0000000000000000000000000000000000000000000000000000000000000abc assuming NULL. 2022-11-18 16:28:56,646 DEBUG __main__ No content found after 12 attempts for 0x00000000000000000000000055662e225a3376759c24331a9aed764f8f0c9fbb assuming NULL. 2022-11-18 16:28:59,814 DEBUG __main__ No content found after 12 attempts for 0xcdeaeef337259e372caa2206b40f50e5c3270b4e530aaf4a6f868dd9e9eb99ee assuming NULL. 2022-11-18 16:29:02,978 DEBUG __main__ No content found after 12 attempts for 0x70ec76f8410a4ce59693a5edb071f61add73d86aafee6449434832c011e5f62a assuming NULL. 2022-11-18 16:29:06,116 DEBUG __main__ No content found after 12 attempts for 0x4de339ce6a64d7c807b68dc79df5ea2a1608c6b0c577722c389ef43bedc63f97 assuming NULL. 2022-11-18 16:29:09,245 DEBUG __main__ No content found after 12 attempts for 0xdadada0000000000000000000000000000000000000000000000000000000ccc assuming NULL. 2022-11-18 16:29:12,391 DEBUG __main__ No content found after 12 attempts for 0x0000000000000000000000000000000000000000000000000000000123000000 assuming NULL. 2022-11-18 16:29:15,548 DEBUG __main__ No content found after 12 attempts for 0xf6a005bde820da47fdbb19bc07e56782b9ccec403a6899484cf502090627af8a assuming NULL. 2022-11-18 16:29:18,674 DEBUG __main__ No content found after 12 attempts for 0x000000000000000000000000000000000000000000000000000000000000ca1f assuming NULL. 2022-11-18 16:29:21,826 DEBUG __main__ No content found after 12 attempts for 0x0000000000000000000000000000000000000000000000000000000000000001 assuming NULL. 2022-11-18 16:29:25,004 DEBUG __main__ No content found after 12 attempts for 0xc83088a854cc99e6d47c6d0a1e62d4aad2ed70bf7ad786e9434b67e33bf3212f assuming NULL. 2022-11-18 16:29:28,160 DEBUG __main__ No content found after 12 attempts for 0x2947be33ebfa25686ec204857135dd1c676f35d6c252eb066fffaf9b493a01b4 assuming NULL. 2022-11-18 16:29:31,346 DEBUG __main__ No content found after 12 attempts for 0x0000000000000000000000000000000000000000000000000000000000000002 assuming NULL. 2022-11-18 16:29:34,524 DEBUG __main__ No content found after 12 attempts for 0x0000000000000000000000000000000000000000000000000000000000000063 assuming NULL. 2022-11-18 16:29:37,663 DEBUG __main__ No content found after 12 attempts for 0xe0b7067c7ae666fecbfe5780c62fa58cea3c6daa8968015baf11d0ab4c568662 assuming NULL. 2022-11-18 16:29:40,807 DEBUG __main__ No content found after 12 attempts for 0xf785fae7a7c5abc49f3cd6a61f6df1ff26433392b066ee9ff2240ff1eb7ab6e4 assuming NULL. 2022-11-18 16:29:43,981 DEBUG __main__ No content found after 12 attempts for 0x0000000000000000000000000000000000000000000000000000000000000ccc assuming NULL. 2022-11-18 16:29:47,120 DEBUG __main__ No content found after 12 attempts for 0x8df6b31c0801d8f28ba631fe97ac650e3ddf08d2a0b47455b57c03e4131e3eb4 assuming NULL. 2022-11-18 16:29:47,122 INFO dune_client.file Nothing to write to missing_app_hashes... skipping ```

OUTPUT FROM 4th run.

data.run4.zip

I actually discovered a bug (write always skips on empty files) so the last missing records were not wiped from the missing values file! WILL FIX ASAP (we need to refactor the FileIO anyway).