charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
203 stars 49 forks source link

CkIO Error Getting Lustre Stripe Size on Blue Waters #2456

Closed rbuch closed 5 years ago

rbuch commented 5 years ago

@trquinn reported the following:

I'm getting the following abort using the CkIO on a Lustre file system.

This is for charm version Converse/Charm++ Commit ID: v6.9.0-300-g99cda7a11

Any ideas to fix this?

CkIO llapi error -61 on h516.cosmo25cmb.3072g1wsbHK1.002080.rung ------------- Processor 0 Exiting: Called CmiAbort ------------ Reason: [CkIO] llapi_file_get_stripe error [0] Stack Traceback: [0:0] _Z14CmiAbortHelperPKcS0_S0_ii+0x5e [0x817a83] [0:1] [0x817af3] [0:2] CkGetFileStripeSize+0x59 [0x722faf] [0:3] _ZN2Ck2IO4impl8Director8openFileENSt7__cxx1112basic_stringIcSt11char_tra itsIcESaIcEEE10CkCallbackNS0_7OptionsE+0x4b3 [0x7212fd] [0:4] _ZN2Ck2IO4impl16CkIndex_Director24_call_openFilemarshall2EPvS3+0x1e6 [0x717034]

rbuch commented 5 years ago

I wasn't able to reproduce this on Stampede 2 (my Blue Waters account is currently inaccessible), so I haven't tested and verified that the fix in #2457 actually works, but I'm pretty sure it will.