Open tjgalvin opened 1 year ago
The error is coming from https://github.com/casacore/casacore/blob/5a8df94738bdc36be27e695d7b14fe949a1cc2df/casa/IO/FiledesIO.cc#L100-L104. This is a simple write(2)
call which in principle shouldn't result in an E2BIG
errno
value. I suspect the underlying filesystem of /scratch3
(which one is it, do you know?) is complaining about something during the write, resulting in that non-standard error value for write
.
Thanks for the quick response!
It is a lustre backed file system, so all bets are off in understanding what is going on with it. I might raise it will Th the tech staff for this HPC then.
If you are satisfied it is nothing funny going on in casacore feel free to close this issue.
Huge thanks again!
On Fri, 11 Aug 2023, 5:42 pm rtobar, @.***> wrote:
The error is coming from https://github.com/casacore/casacore/blob/5a8df94738bdc36be27e695d7b14fe949a1cc2df/casa/IO/FiledesIO.cc#L100-L104. This is a simple write(2) call which in principle shouldn't result in an E2BIG errno value. I suspect the underlying filesystem of /scratch3 (which one is it, do you know?) is complaining about something during the write, resulting in that non-standard error value for write.
— Reply to this email directly, view it on GitHub https://github.com/casacore/python-casacore/issues/245#issuecomment-1674471168, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQOAJVXNQGYMZYACRIQQKLXUX5AXANCNFSM6AAAAAA3MOYWFU . You are receiving this because you authored the thread.Message ID: @.***>
Hi all,
A little strange issue popped up that has left me scratching my hand.
I was processing a collection of measurement sets in a pipeline. There is a stage early on that iterates over rows in the data table of a singular measurement set, and updates visbilities after applying a rotation correction, before writing them back out. This happens in a chunking fashion. This code is available here: https://github.com/AlecThomson/FixMS/blob/main/fixms/fix_ms_corrs.py#L264
Recently I was running a hefty series of jobs and stumbled on this error:
I am unsure what to make of this. I have reran my pipeline on a smaller dataset and which included this measurement set and found no issue. Looking at the specific error
Argument list too long
reads like there was some interaction with a shell when trying to flush the buffers to disk. Like there is a largecp
orrm
command trying to be executed.Would you happen to have any insight into this and the underlying behavior of the close and flush of a casacore table? Is there a series of temporary files stored, say, in
/dev/shm
that are examined or the current working directory? I am at a total loss as to where else to look, and it is not clear to me if this is actually a python-casacore, a casacore or some other related issue.