Open martin-ueding opened 4 years ago
For some reason this went through for two configurations this time:
$ ls resolved_-10-1_B2_*
resolved_-10-1_B2_2496.js resolved_-10-1_B2_5328.js
This is very strange. My first instinct would be to guess that it's related to having too many HDF5 files open at the same time (I could imagine that these are internally opened using mmap
), but this would suggest that things would also fail elsewhere.
I guess in the original description you mean (-1, 0, -1)
rather than (-1, 0, 1)
, correct?
I really don't get it either. And there are not too many HDF5 files open, I start a new R process for every configuration and every irrep. It just crashes. And since it worked on two configurations, there cannot be something completely wrong with the program or the files.
I meant globally. When there are O(30) projection jobs running, the number of memory mapped files will be rather large and this might be problematic for Lustre. What if you run a projection for a single config on QBIG?
What if you run a projection for a single config on QBIG?
After all the projections were done, I did try that to see what the issue was. It seems that even with a single irrep in the whole cluster there is a problem.
I will find out how the other ensembles fare with that, perhaps it is always this irrep or just that irrep on cA2.60.32.
Hah, we figured in the end. @matfischer observed the same problem and it was solved by reinstalling rhdf5 :)
not so fast, apparently...
I am re-running the projections on cA2.60.32 and they work just fine for almost all irreps in every configuration. There is just one exception, namely
P = (-1, 0, -1)
in the B₂ irrep. And that for every configuration. It is always this output:I have tried to restart these jobs, but that did not help either. We had some random segfaults before, but this is consistent. It seems that it has something to do with the actual files. And it happens on all of the nodes that I have tried.
The only difference in input is the prescription file. And that does not differ from the other ensembles. And the ones related with a global rotation are just fine.
For the meantime I will just skip that B₂ irrep at P² = 2, but it feels very peculiar and I still have no idea what happens there.