TritonDataCenter / manta-thoth

Thoth is a Manta-based system for core and crash dump management
16 stars 7 forks source link

thoth should avoid holding open cores dataset #174

Open bahamat opened 5 years ago

bahamat commented 5 years ago

We've had some occurrences where a manta reprovision failed because thoth was in the process of uploading a core for the instance.

Maybe we should move cores out to a different directory before uploading?

jclulow commented 5 years ago

In order to get them from the cores dataset to another file system, they'll have to be copied first; you can't atomically rename across a file system boundary. Some cores are large, and the system may be under heavy I/O load, so this copying may itself take a considerable period of time and be holding the dataset open in the meantime.

I think we probably need to assume that there may always be something holding the cores dataset open, and deal with that in the reprovision process. Either by waiting and retrying the unmount, or by forcing the unmount, or by restructuring some part of the file system layout or the reprovision process so that the unmount isn't necessary at all.