Closed gridley closed 4 months ago
Thanks so much for fixing all these tests -- these are super useful fixes! Would be awesome to have most of the python tests working again. At first glance the fixes in the PR generally look good, though all the HPC systems at ANL are down this week, so I won't be able to do any testing with this until next week.
I did have a few questions though on the new Named
template class:
std::string
, and assuming they are not copied correctly?std::string name_
fields to something like Name name_
.name_
fields are already not trivially copyable, it could make sense to make this field dynamically sized and copy it in the same manner as other dynamic fields, as this would be one less macro to worry about.If we don't need to access them, is there any harm in just leaving them as
std::string
, and assuming they are not copied correctly?
Yes. std::string causes segfaults when its destructor runs if we have expanded an array of objects which contain std::string's has their destructors run. I'm not opposed to leaving pointers to names rather than statically sized names, but simply removing std::string and not having to worry about running its copy constructor when resizing arrays is pretty nice.
3. If we do need to access the names on device, is there any reason why the various classes that have a name field would need very different static name lengths? Why introduce an inheritance operation just to add a single field to a bunch of classes that would be ok sharing the same static length of that field? It seems like it would be more straightforward to just convert the
std::string name_
fields to something likeName name_
.
Firstly, its nice to have getters and setters that protect us in copying to a static array, so that we can either warn about truncation or change the underlying implementation of how names are stored, e.g. switch off to pointers to elsewhere.
To answer the first question, elements and isotopes can be expected to have at most 8 bytes of data, which keeps alignment of the struct as if we're adding one double precision number. It's nice to minimize how much extra space we take, so it's templated. On the other hand, thermal scattering data names are a bit longer so it makes sense to allow 16 characters. Lastly, cells and lattices are given longer names sometimes in the test suite, hence the choice of 32 characters.
Another reason that inheritance solves this problem cleanly is that we often require std::string to be used as our interchange format for character data, e.g. in the HDF5 interface. This way, if we switch off to a char array, the calls to HDF5 stuff, and std::string::operator+ still work as expected, since the name()
method builds an std::string off the underlying array.
Lastly, inheritance in this case is nothing more than adding a data member. It's just more semantically meaningful.
4. s most of the classes in OpenMC that have
name_
fields are already not trivially copyable
Now that std::string is removed from them, they are trivially copyable. This is why std::realloc now works with the various classes that inherit from Named
now.
As for accessing the names on device, no, it's definitely not necessary. Just a plus for debugging down the road. We could replace these with a vector
Ah, when I say trivially copyable, I was thinking of host -> device copying, rather than host -> host copying. In host -> device copying, we still have to manually copy the vector data individually, so most objects are not trivially copyable in that context. At some point, we should switch over to using OpenMP custom mappers so that the host -> device copying becomes trivial, but we do not utilize that feature yet.
To your last point -- why not just make the name_
field an openmc::vector<char>
? As an optimization, this would also push the unused name_
field data out of the actual contiguous storage regions of the class objects, which could improve locality of accesses to their member fields.
Yeah, I agree with you on that. It's a nice micro optimization to have. I'll revise this PR to use that technique instead!
Still, vector
Sounds good! Yeah, I can see the utility of defining a Name
class as you do now and just having it be a normal Name name_;
member field in the classes that use it.
So you definitely want to remove the inheritance approach? If we're going to refactor it like this, I'll probably name it TriviallyCopyableString or something like that.
Hey John, stuff should be good-to-go now! I've left it as a mixin class though, since we already were using member functions like set_name, it makes sense to do this. Otherwise I'll just be copying and pasting repeated code in various places. Inheritance in this case is nothing more than adding a data member and adding some methods.
Sounds reasonable -- I'll take another look and will do some testing on this branch. Thanks again for putting these fixes together!
I tried compiling with LLVM with A100 as a target, I'm getting compile time error:
clang++: error: unknown argument: '-fext-numeric-literals'
When I remove this argument from the CMakeLists.txt and try again, I get a link-time error:
[ 81%] Linking CXX shared library lib/libopenmc.so
nvlink error : Undefined reference to 'strlen' in '/tmp/unity_0_cxx.cxx-nvptx64-nvidia-cuda-sm_80-b4e0c5.cubin'
nvlink error : Undefined reference to 'isatty' in '/tmp/unity_0_cxx.cxx-nvptx64-nvidia-cuda-sm_80-b4e0c5.cubin'
nvlink error : Undefined reference to 'exit' in '/tmp/unity_0_cxx.cxx-nvptx64-nvidia-cuda-sm_80-b4e0c5.cubin'
nvlink error : Undefined reference to '_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l' in '/tmp/unity_0_cxx.cxx-nvptx64-nvidia-cuda-sm_80-b4e0c5.cubin'
nvlink error : Undefined reference to '_ZSt24__throw_out_of_range_fmtPKcz' in '/tmp/unity_0_cxx.cxx-nvptx64-nvidia-cuda-sm_80-b4e0c5.cubin'
nvlink error : Undefined reference to '_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_createERmm' in '/tmp/unity_0_cxx.cxx-nvptx64-nvidia-cuda-sm_80-b4e0c5.cubin'
nvlink error : Undefined reference to '_ZNSo3putEc' in '/tmp/unity_0_cxx.cxx-nvptx64-nvidia-cuda-sm_80-b4e0c5.cubin'
nvlink error : Undefined reference to '_ZNSo5flushEv' in '/tmp/unity_0_cxx.cxx-nvptx64-nvidia-cuda-sm_80-b4e0c5.cubin'
nvlink error : Undefined reference to '_ZSt16__throw_bad_castv' in '/tmp/unity_0_cxx.cxx-nvptx64-nvidia-cuda-sm_80-b4e0c5.cubin'
nvlink error : Undefined reference to '_ZNKSt5ctypeIcE13_M_widen_initEv' in '/tmp/unity_0_cxx.cxx-nvptx64-nvidia-cuda-sm_80-b4e0c5.cubin'
nvlink error : Undefined reference to '_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE5rfindEcm' in '/tmp/unity_0_cxx.cxx-nvptx64-nvidia-cuda-sm_80-b4e0c5.cubin'
nvlink error : Undefined reference to '_ZSt4cerr' in '/tmp/unity_0_cxx.cxx-nvptx64-nvidia-cuda-sm_80-b4e0c5.cubin'
clang: error: fatbinary command failed with exit code 255 (use -v to see invocation)
clang version 17.0.0 (https://github.com/llvm/llvm-project.git 091bfa76db64fbe96d0e53d99b2068cc05f6aa16)
Target: nvptx64-nvidia-cuda
Thread model: posix
InstalledDir: /soft/compilers/llvm/main-20230628/bin
clang: note: diagnostic msg: Error generating preprocessed source(s).
/soft/compilers/llvm/master-nightly/bin/clang-linker-wrapper: error: 'clang' failed
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [CMakeFiles/libopenmc.dir/build.make:106: lib/libopenmc.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:226: CMakeFiles/libopenmc.dir/all] Error 2
make: *** [Makefile:136: all] Error 2
Oh yeah, fext-numeric-literals is a gcc thing. Lets you write complex literals very easily, but we should move to the more portable approach of using namespace std::complex_literals;
. Will include a commit for that.
On the linking, looks like gcc transitively is including a system header that LLVM is not. Thanks for pointing this out, will also test with LLVM. Or maybe it's trying to generate device code containing strlen
. Hm. Will look into it.
I re-ran without the unity build to perhaps narrow down the problem, it gives:
[ 8%] Linking CXX shared library lib/libopenmc.so
nvlink error : Undefined reference to '_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_createERmm' in '/tmp/mesh.cpp-nvptx64-nvidia-cuda-sm_80-1850e9.cubin'
nvlink error : Undefined reference to '_ZN6openmc11fatal_errorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEi' in '/tmp/mesh.cpp-nvptx64-nvidia-cuda-sm_80-1850e9.cubin'
nvlink error : Undefined reference to 'strlen' in '/tmp/mesh.cpp-nvptx64-nvidia-cuda-sm_80-1850e9.cubin'
clang: error: fatbinary command failed with exit code 255 (use -v to see invocation)
clang version 17.0.0 (https://github.com/llvm/llvm-project.git 091bfa76db64fbe96d0e53d99b2068cc05f6aa16)
Just want to say: I have not forgotten about this but have just had trouble debugging what's going on, will hopefully get to this soon.
Many, but not all, of the tests work now. I've also marked a few of the tests as things we should skip for now. I'm submitting this as intermediate work on the way to getting pytest usable here again.
Also, we can now run depletion here, which is nice. Previously stuff would crash and burn on depletion problems.
Closes #26 and #32