ParabolInc / parabol

Free online agile retrospective meeting tool
https://www.parabol.co/
Other
1.87k stars 326 forks source link

Pre-Deploy does not fail when a Cloudflare error occurs #9911

Closed rafaelromcar-parabol closed 4 days ago

rafaelromcar-parabol commented 5 days ago

Issue - Bug

Pre-Deploy was completed as successful in Staging but in the logs we can see a Cloudflare error

ServerID 157                                                                                                                                                                                                       
🚀 Predeploy Started v7.37.8 sha:bae8a667bc0d1d52205a67c1e66beebf57c96cff                                                                                                                                          
👴 RethinkDB Migration Started                                                                                                                                                                                     
[migrate-rethinkdb] No new migrations                                                                                                                                                                              
👴 RethinkDB Migration Complete                                                                                                                                                                                    
🐘 Postgres Migration Started                                                                                                                                                                                      
🔩 Postgres Extension Checks Started                                                                                                                                                                               
   pgvector                                                                                                                                                                                                        
🔩 Postgres Extension Checks Completed                                                                                                                                                                             
PostgreSQL - No migrations to run!                                                                                                                                                                                 
🐘 Postgres Migration Complete                                                                                                                                                                                     
🔗 QueryMap Persistence Started                                                                                                                                                                                    
⛓️ Prime Integrationgs Started                                                                                                                                                                                      
⛅️ Push to CDN Started                                                                                                                                                                                             
⛓️ Prime Integrations Complete                                                                                                                                                                                      
ReqlDriverError: The connection was closed before the query could be completed                                                                                                                                     
🔗 QueryMap Persistence Complete: 0 records added                                                                                                                                                                  
Cloudflare error {                                                                                                                                                                                                 
  status: 524,                                                                                                                                                                                                     
  date: '2024-07-01T15:15:27.000Z',                                                                                                                                                                                
  path: '/staging/build/Insights_b6abc5de45ac3fc9c90b.js?x-id=PutObject'                                                                                                                                           
}                                                                                                                                                                                                                  
⛅️ Uploaded 219 client assets to CDN                                                                                                                                                                               
⛅️ Server upload complete. Pushed 0 assets to CDN                                                                                                                                                                  
⛅️ Push to CDN Complete                                                                                                                                                                                            
🚀 Predeploy Complete                                                                                                                                                                                              
Stream closed EOF for parabol/parabol-predeploy-8glj6 (parabol-predeploy)

This should have failed, because if pre-deploy does not fail and exit with a code different than 0, the release is carried on, and it deploys the new Web Servers and GQL Executors, which could cause downtime.

Acceptance Criteria (optional)

Pre-Deploy fails if there is an error. Triage is performed to find the root cause of the bug, timeboxed to ~1 hour.

Estimated effort: 1 hour to triage. More if root cause is already identified.

Dschoordsch commented 5 days ago

@rafaelromcar-parabol The output is misleading. We're logging the error for debugging purposes, but we're retrying all 520-530 errors. As you can see by the line

⛅️ Push to CDN Complete                                                                                                                                                                                            

the push did not fail.