Open-EO / openeo-geotrellis-kubernetes

Contains scripts to run openeo geotrellis backend on a Kubernetes cluster on DIAS.
Apache License 2.0
3 stars 3 forks source link

Goofys S3 mounts crash #9

Closed jdries closed 10 months ago

jdries commented 1 year ago

Goofys mounts of S3 are used for applications that require POSIX access to data in object storage.

Goofys performance is better thanb s3fs, but we see an issue that happens with goofys, and not with s3fs . It seems to occur when listing directories. Without debugging enabled, this problem show up as hanging mounts, causing full executors to hang as well.


Nov 18 09:01:38 kube-node-dev-4.novalocal /usr/sbin/goofys[1554562]: buffer.DEBUG read 131072 from buffer
Nov 18 09:01:38 kube-node-dev-4.novalocal /usr/sbin/goofys[1554562]: fuse.DEBUG < readFile 95 Sentinel-1/SAR/GRD/2021/07/05/S1A_IW_GRDH_1SDV_20210705T053509_20210705T053534_038638_048F2B_76B1.SAFE/measurement/s1a-iw-grd-vv-20210705t053509-20210705t053534-038638-048f2b-001.tiff [131072 <nil>]
Nov 18 09:01:38 kube-node-dev-4.novalocal /usr/sbin/goofys[1554562]: fuse.DEBUG < ReadFile 95 Sentinel-1/SAR/GRD/2021/07/05/S1A_IW_GRDH_1SDV_20210705T053509_20210705T053534_038638_048F2B_76B1.SAFE/measurement/s1a-iw-grd-vv-20210705t053509-20210705t053534-038638-048f2b-001.tiff [131072 <nil>]
Nov 18 09:01:38 kube-node-dev-4.novalocal /usr/sbin/goofys[1554562]: fuse.DEBUG Op 0x00005ac0        connection.go:491] -> OK ()
Nov 18 09:01:38 kube-node-dev-4.novalocal /usr/sbin/goofys[1554562]: fuse.DEBUG Op 0x00005ac2        connection.go:408] <- ReadFile (inode 95, handle 51, offset 354975744, 131072 bytes)
Nov 18 09:01:44 kube-node-dev-4.novalocal systemd[1]: run-docker-runtime\x2drunc-moby-771e80d2e58581039a54032eae83da4face1959c03c31ef8ac25ccb3053b2cdb-runc.BpbQ2L.mount: Succeeded.
Nov 18 09:01:56 kube-node-dev-4.novalocal /usr/sbin/goofys[1554562]: main.ERROR stacktrace from panic: runtime error: slice bounds out of range [103:102] 
                                                                     goroutine 18563 [running]:
                                                                     runtime/debug.Stack(0xc0003b9a68, 0xca7b20, 0xc02e0c2fc0)
                                                                             /usr/local/go/src/runtime/debug/stack.go:24 +0x9d
                                                                     github.com/kahing/goofys/api/common.LogPanic(0xc0003b9f18)
                                                                             /home/kahing/go/src/github.com/kahing/goofys/api/common/panic_logger.go:32 +0x7c
                                                                     panic(0xca7b20, 0xc02e0c2fc0)
                                                                             /usr/local/go/src/runtime/panic.go:679 +0x1b2
                                                                     github.com/kahing/goofys/internal.(*DirHandle).ReadDir(0xc02e031c40, 0x0, 0x0, 0x0, 0x0)
                                                                             /home/kahing/go/src/github.com/kahing/goofys/internal/dir.go:483 +0xed9
                                                                     github.com/kahing/goofys/internal.(*Goofys).ReadDir(0xc000230000, 0xeb5b00, 0xc02e11b3e0, 0xc02df6db00, 0x0, 0x0)
                                                                             /home/kahing/go/src/github.com/kahing/goofys/internal/goofys.go:834 +0x235
                                                                     github.com/kahing/goofys/api/common.FusePanicLogger.ReadDir(0xec7540, 0xc000230000, 0xeb5b00, 0xc02e11b3e0, 0xc02df6db00, 0x0, 0x0)
                                                                             /home/kahing/go/src/github.com/kahing/goofys/api/common/panic_logger.go:101 +0xa0
                                                                     github.com/kahing/goofys/vendor/github.com/jacobsa/fuse/fuseutil.(*fileSystemServer).handleOp(0xc000206860, 0xc0001ead00, 0xeb5b00, 0xc02e11b3e0, 0xbb6c40, 0xc02df6db00)
                                                                             /home/kahing/go/src/github.com/kahing/goofys/vendor/github.com/jacobsa/fuse/fuseutil/file_system.go:182 +0xe16
                                                                     created by github.com/kahing/goofys/vendor/github.com/jacobsa/fuse/fuseutil.(*fileSystemServer).ServeOps
                                                                             /home/kahing/go/src/github.com/kahing/goofys/vendor/github.com/jacobsa/fuse/fuseutil/file_system.go:122 +0x1a0
Nov 18 09:01:56 kube-node-dev-4.novalocal /usr/sbin/goofys[1554562]: fuse.ERROR *fuseops.ReadDirOp error: input/output error

This is the stack from orfeo toolbox:

Nov 18 09:02:02 kube-node-dev-4.novalocal systemd-coredump[1564687]: Process 1564663 (python3) of user 18585 dumped core.

                                                                     Stack trace of thread 104:
                                                                     #0  0x00007f3e6892fabf raise (libpthread.so.0)
                                                                     #1  0x00007f3e6892fc20 __restore_rt (libpthread.so.0)
                                                                     #2  0x00007f3e67e0537f raise (libc.so.6)
                                                                     #3  0x00007f3e67defdb5 abort (libc.so.6)
                                                                     #4  0x00007f3e5e93b09b _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold.1 (libstdc++.so.6)
                                                                     #5  0x00007f3e5e94153c _ZN10__cxxabiv111__terminateEPFvvE (libstdc++.so.6)
                                                                     #6  0x00007f3e5e940559 __cxa_call_terminate (libstdc++.so.6)
                                                                     #7  0x00007f3e5e940ed8 __gxx_personality_v0 (libstdc++.so.6)
                                                                     #8  0x00007f3e61027b03 _Unwind_RaiseException_Phase2 (libgcc_s.so.1)
                                                                     #9  0x00007f3e6102841d _Unwind_Resume (libgcc_s.so.1)
                                                                     #10 0x00007f3e2ea14b11 _ZN5boost10filesystem6detail28directory_iterator_incrementERNS0_18directory_iteratorEPNS_6system10error_codeE.cold.87 (/usr/lib64/libboost_filesystem.so.1.66.0)
                                                                     #11 0x00007f3e2ea18c95 _ZN5boost10filesystem6detail28directory_iterator_constructERNS0_18directory_iteratorERKNS0_4pathEPNS_6system10error_codeE (/usr/lib64/libboost_filesystem.so.1.66.0)
                                                                     #12 0x00007f3e320f3f9c _ZN3otb15ExtractXMLFiles22GetXMLFilesInDirectoryERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEb (/usr/lib64/libOTBMetadata-8.1.so.1)
                                                                     #13 0x00007f3e320f454f _ZN3otb15ExtractXMLFiles15GetResourceFileERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES6_b (/usr/lib64/libOTBMetadata-8.1.so.1)
                                                                     #14 0x00007f3e320f977c _ZN3otb34TerraSarXSarImageMetadataInterface9ParseGdalERNS_13ImageMetadataE (/usr/lib64/libOTBMetadata-8.1.so.1)
                                                                     #15 0x00007f3e320f271e _ZN3otb34TerraSarXSarImageMetadataInterface5ParseERNS_13ImageMetadataE (/usr/lib64/libOTBMetadata-8.1.so.1)
                                                                     #16 0x00007f3e320bce16 _ZN3otb29ImageMetadataInterfaceFactory9CreateIMIERNS_13ImageMetadataERKNS_25MetadataSupplierInterfaceE (/usr/lib64/libOTBMetadata-8.1.so.1)
                                                                     #17 0x00007f3e34c25821 _ZN3otb15ImageFileReaderINS_5ImageISt7complexIfELj2EEENS_25DefaultConvertPixelTraitsIS3_EEE25GenerateOutputInformationEv (/usr/lib64/libOTBImageIO-8.1.so.1)
                                                                     #18 0x00007f3e36757845 n/a (/usr/lib64/libITKCommon-4.13.so.1)
                                                                     #19 0x00007f3e35a942fc n/a (/usr/lib64/libOTBApplicationEngine-8.1.so.1)
                                                                     #20 0x00007f3e1431a67c _ZN3otb7Wrapper14SARCalibration9DoExecuteEv (/usr/lib/otb/applications/otbapp_SARCalibration.so)
                                                                     #21 0x00007f3e35b81989 n/a (/usr/lib64/libOTBApplicationEngine-8.1.so.1)
                                                                     #22 0x00007f3e35b810d2 n/a (/usr/lib64/libOTBApplicationEngine-8.1.so.1)
                                                                     #23 0x00007f3e3614fde2 _wrap_Application_Execute (/usr/lib64/python3.8/site-packages/_otbApplication.so)
                                                                     #24 0x00007f3e68eb1f3a PyCFunction_Call (/usr/lib64/libpython3.8.so.1.0)
                                                                     #25 0x00007f3e68e919e2 _PyObject_MakeTpCall (/usr/lib64/libpython3.8.so.1.0)
                                                                     #26 0x00007f3e68f4cb54 _PyEval_EvalFrameDefault (/usr/lib64/libpython3.8.so.1.0)
                                                                     #27 0x00007f3e68f107ea method_vectorcall (/usr/lib64/libpython3.8.so.1.0)
                                                                     #28 0x00007f3e68f4c5a7 _PyEval_EvalFrameDefault (/usr/lib64/libpython3.8.so.1.0)
                                                                     #29 0x00007f3e68f0ebe7 _PyEval_EvalCodeWithName (/usr/lib64/libpython3.8.so.1.0)
                                                                     #30 0x00007f3e68f10232 _PyFunction_Vectorcall (/usr/lib64/libpython3.8.so.1.0)
                                                                     #31 0x00007f3e68e9c73f PyObject_Call (/usr/lib64/libpython3.8.so.1.0)
                                                                     #32 0x00007f3e68f494b6 _PyEval_EvalFrameDefault (/usr/lib64/libpython3.8.so.1.0)
                                                                     #33 0x00007f3e68f0ffcf _PyFunction_Vectorcall (/usr/lib64/libpython3.8.so.1.0)
                                                                     #34 0x00007f3e68f479ad _PyEval_EvalFrameDefault (/usr/lib64/libpython3.8.so.1.0)
                                                                     #35 0x00007f3e68f0ebe7 _PyEval_EvalCodeWithName (/usr/lib64/libpython3.8.so.1.0)
                                                                     #36 0x00007f3e68f108dc method_vectorcall (/usr/lib64/libpython3.8.so.1.0)
                                                                     #37 0x00007f3e68f4853b _PyEval_EvalFrameDefault (/usr/lib64/libpython3.8.so.1.0)
                                                                     #38 0x00007f3e68f0ffcf _PyFunction_Vectorcall (/usr/lib64/libpython3.8.so.1.0)
                                                                     #39 0x00007f3e68f479ad _PyEval_EvalFrameDefault (/usr/lib64/libpython3.8.so.1.0)
                                                                     #40 0x00007f3e68f0ffcf _PyFunction_Vectorcall (/usr/lib64/libpython3.8.so.1.0)
                                                                     #41 0x00007f3e68f34666 slot_tp_init (/usr/lib64/libpython3.8.so.1.0)
                                                                     #42 0x00007f3e68e9187a _PyObject_MakeTpCall (/usr/lib64/libpython3.8.so.1.0)
                                                                     #43 0x00007f3e68f4c427 _PyEval_EvalFrameDefault (/usr/lib64/libpython3.8.so.1.0)
                                                                     #44 0x00007f3e68f0ffcf _PyFunction_Vectorcall (/usr/lib64/libpython3.8.so.1.0)
                                                                     #45 0x00007f3e68f4c5a7 _PyEval_EvalFrameDefault (/usr/lib64/libpython3.8.so.1.0)
                                                                     #46 0x00007f3e68f0ffcf _PyFunction_Vectorcall (/usr/lib64/libpython3.8.so.1.0)
                                                                     #47 0x00007f3e68f4c5a7 _PyEval_EvalFrameDefault (/usr/lib64/libpython3.8.so.1.0)
                                                                     #48 0x00007f3e68f0ffcf _PyFunction_Vectorcall (/usr/lib64/libpython3.8.so.1.0)
                                                                     #49 0x00007f3e68f479ad _PyEval_EvalFrameDefault (/usr/lib64/libpython3.8.so.1.0)
                                                                     #50 0x00007f3e68f0ebe7 _PyEval_EvalCodeWithName (/usr/lib64/libpython3.8.so.1.0)
                                                                     #51 0x00007f3e68f10232 _PyFunction_Vectorcall (/usr/lib64/libpython3.8.so.1.0)
                                                                     #52 0x00007f3e68f4853b _PyEval_EvalFrameDefault (/usr/lib64/libpython3.8.so.1.0)
                                                                     #53 0x00007f3e68f0ebe7 _PyEval_EvalCodeWithName (/usr/lib64/libpython3.8.so.1.0)
                                                                     #54 0x00007f3e68f10232 _PyFunction_Vectorcall (/usr/lib64/libpython3.8.so.1.0)
                                                                     #55 0x00007f3e68e9c73f PyObject_Call (/usr/lib64/libpython3.8.so.1.0)
                                                                     #56 0x00007f3e68f494b6 _PyEval_EvalFrameDefault (/usr/lib64/libpython3.8.so.1.0)
                                                                     #57 0x00007f3e68f0ebe7 _PyEval_EvalCodeWithName (/usr/lib64/libpython3.8.so.1.0)
                                                                     #58 0x00007f3e68f10232 _PyFunction_Vectorcall (/usr/lib64/libpython3.8.so.1.0)
                                                                     #59 0x00007f3e68e9c73f PyObject_Call (/usr/lib64/libpython3.8.so.1.0)
                                                                     #60 0x00007f3e68f494b6 _PyEval_EvalFrameDefault (/usr/lib64/libpython3.8.so.1.0)
                                                                     #61 0x00007f3e68f0ebe7 _PyEval_EvalCodeWithName (/usr/lib64/libpython3.8.so.1.0)
                                                                     #62 0x00007f3e68f10232 _PyFunction_Vectorcall (/usr/lib64/libpython3.8.so.1.0)
                                                                     #63 0x00007f3e68e9c73f PyObject_Call (/usr/lib64/libpython3.8.so.1.0)

Goofys also has open issues that point in a similar direction: https://github.com/kahing/goofys/issues/342

There's also forks that were apparently created to deal with issues in goofys: https://github.com/yandex-cloud/geesefs

jdries commented 10 months ago

closing issue: we found a way to avoid using mounted s3 for backscatter computation. This means the current s3fs option is workable.