Open bahamat opened 5 years ago
In order to get them from the cores dataset to another file system, they'll have to be copied first; you can't atomically rename across a file system boundary. Some cores are large, and the system may be under heavy I/O load, so this copying may itself take a considerable period of time and be holding the dataset open in the meantime.
I think we probably need to assume that there may always be something holding the cores dataset open, and deal with that in the reprovision process. Either by waiting and retrying the unmount, or by forcing the unmount, or by restructuring some part of the file system layout or the reprovision process so that the unmount isn't necessary at all.
We've had some occurrences where a manta reprovision failed because thoth was in the process of uploading a core for the instance.
Maybe we should move cores out to a different directory before uploading?