conda-forge / numba-feedstock

A conda-smithy repository for numba.
BSD 3-Clause "New" or "Revised" License
0 stars 27 forks source link

DroneIO hangs randomly #74

Closed henryiii closed 3 years ago

henryiii commented 3 years ago

As noticed in #73 , DroneIO is sometimes timing out at 1 hour instead of finishing in 20 minutes. It seems to be quite random as to which jobs fail (and Drone only has a rerun all option, AFAICT).

3444    __Warning log__
3445    Warning (cuda): CUDA driver library cannot be found or no CUDA enabled devices are present.
3446    Exception class: <class 'numba.cuda.cudadrv.error.CudaSupportError'>
3447    Warning (roc): Error initialising ROC: No ROC toolchains found.
3448    Warning (roc): No HSA Agents found, encountered exception when searching: Error at driver init: 
3449    NUMBA_HSA_DRIVER /opt/rocm/lib/libhsa-runtime64.so is not a valid file path.  Note it must be a filepath of the .so/.dll/.dylib or the driver:
3450    Warning (psutil): psutil cannot be imported. For more accuracy, consider installing it.
3451    --------------------------------------------------------------------------------
3452    If requested, please copy and paste the information between
3453    the dashed (----) lines, or from a given specific section as
3454    appropriate.
3455    
3456    =============================================================
3457    IMPORTANT: Please ensure that you are happy with sharing the
3458    contents of the information present, any information that you
3459    wish to keep private you should remove before sharing.
3460    =============================================================
3461    
3462    ....s.....
3463    ----------------------------------------------------------------------
3464    Ran 10 tests in 267.252s
3465    
3466    OK (skipped=1)
3467    Running only a slice of tests
3468    skipped CUDA tests
3469    skipped CUDA tests
3470    skipped HSA tests
3471    skipped HSA tests
3472    Parallel: 1718. Serial: 125
3473    test_dict
3474    aligned_size(index_size * size) = 8
3475    d 0xaaaadfc5b090
3476    d->usable = 5
3477    d[0] 104
3478    d[1] 128
3479    ix = -1
3480    got_value 1234567
3481    Dict dump
3482       key_size = 4
3483       val_size = 8
3484      key=62 65 66 00  hash=48879 value=37 36 35 34 33 32 31 00 
3485      key=62 65 67 00  hash=48879 value=31 32 33 34 35 36 38 00 
3486      key=62 65 68 00  hash=51966 value=31 32 33 34 35 36 39 00 
3487      key=62 65 69 00  hash=51966 value=30 5f 30 5f 30 5f 31 00 
3488      key=62 65 6a 00  hash=51966 value=30 5f 30 5f 30 5f 32 00 
3489      key=62 65 6b 00  hash=51966 value=30 5f 30 5f 30 5f 33 00 
3490    j = 6; n = 6
3491    ix = 2
3492    test_list
jakirkham commented 3 years ago

Not all nodes on Drone have the same resources ( https://github.com/conda-forge/numpy-feedstock/pull/191#issuecomment-640130792 ). It's possible these 1hr builds are being scheduled on machines with less resources and so take longer than the allowed time to complete