Closed mreineck closed 2 years ago
Unfortunately I have no idea how to add the required dependency to Project.toml
. Any hint is welcome!
Looks good. From this point the wrappers will be straight forward, I will have a look into this.
We need a way to set the kernel parameters directly in order to make meaningful comparisons.
Unfortunately I have no idea how to add the required dependency to Project.toml. Any hint is welcome!
see https://github.com/JuliaMath/NFFT.jl/commit/5032cb2d5204c74d1cc4dc0472cce326a486aece
see https://github.com/JuliaMath/NFFT.jl/commit/5032cb2d5204c74d1cc4dc0472cce326a486aece
Thanks! My problem was that I could not figure out which hash number needs to be given in the extras
section. I realize now that this is listed in the Registry.toml
file.
We need a way to set the kernel parameters directly in order to make meaningful comparisons.
Sorry, but there is no way to request a specific kernel support (or any other kernel parameter for that matter). Kernels and oversampling factor are selected depending on the requested accuracy and the given grid size and number of nonuniform points. The user shouldn't have to worry about these technicalities.
I can add the limits for the minimum and maximum oversampling factor to the interface, but that still only provides indirect control over the kernel support.
I'll try to add the "planned" interface tomorrow, but I need a bit of help there, because I do not know how to make sure the plan's destructor is called when appropriate.
Base: 88.62% // Head: 88.62% // No change to project coverage :thumbsup:
Coverage data is based on head (
fc800a1
) compared to base (bbda1e3
). Patch has no changes to coverable lines.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
I already found the min/max oversampling factor and added it to the wrapper. It seems to work but unfortunately I get an assertion that no kernel is available for high accuracy. I can run the benchmarks only partially for this reason (limit m
to 6).
What's interesting is that the 1D accuracy is shifted compared to 2D/3D.
I'll try to add the "planned" interface tomorrow, but I need a bit of help there, because I do not know how to make sure the plan's destructor is called when appropriate.
I added a stub where the destructor needs to be called:
https://github.com/JuliaMath/NFFT.jl/blob/ducc0/Wrappers/DUCC0.jl#L62
Of course one needs some pointer to the plan that is created before and stored in the DUCC0Plan
. See https://github.com/NFFT/NFFT3.jl/blob/main/src/NFFT.jl#L64 for an example.
I have merged your changes to my branch and added the changes to switch to the planned interface. Things seem to work, including the destructor calls.
Concerning accuracy, my observation is that it is impossible to reach an accuracy below roughly 4e-13
with double precision, no matter which kernel is used. If you fix the oversampling factor to roughly 2, this limit will most likely be even higher.
One thing that surprises me is the weird timing results reported by ducc
. When switching verbose output on, a typical output during runtests.jl
is
U2nu:
nthreads=1, grid=(14x12x10), oversampled grid=(28x24x20), supp=11, eps=1e-09
npoints=1680
memory overhead: 6.25849e-06GB (index) + 0.000200272GB (oversampled grid)
Total wall clock time for u2nu: 0.2934s
|
+- nu2u proper : 0.38% (0.0011s)
| |
| +- spreading : 89.07% (0.0010s)
| +- FFT : 7.94% (0.0001s)
| +- zeroing grid : 1.71% (0.0000s)
| +- grid correction : 0.59% (0.0000s)
| +- allocating grid : 0.30% (0.0000s)
| +- <unaccounted> : 0.38% (0.0000s)
|
+- u2nu proper : 0.32% (0.0009s)
| |
| +- interpolation : 87.89% (0.0008s)
| +- FFT : 8.59% (0.0001s)
| +- zeroing grid : 1.90% (0.0000s)
| +- grid correction : 0.81% (0.0000s)
| +- allocating grid : 0.36% (0.0000s)
| +- <unaccounted> : 0.46% (0.0000s)
|
+- building index : 0.02% (0.0001s)
+- parameter calculation: 0.00% (0.0000s)
+- correction factors : 0.00% (0.0000s)
+- sorting coords : 0.00% (0.0000s)
+- <unaccounted> : 99.28% (0.2913s)
The timings for the individual sections seem realistic, but when called from C++ or Python, I never see the last unaccounted
line go above one percent! So it seems that the C++ code is interrupted at some point and the CPU is doing something completely different for over 99 percent of the time. Could be garbage collection, for example. Is this a known issue when calling from Julia into compiled code?
Sorry, please ignore the comments about timing for the moment! I think I know what might be going wrong...
Is this a known issue when calling from Julia into compiled code?
ccall
has zero overhead and GC will not run during a call.
I rebase you changes into my branch since I made some follow up commits.
I kept the nodes in the plan for the moment. We already have a nodes!
setter and I likely will add a nodes
getter. And unless I can get the nodes via a ccall
it is better to store the array, which is actually just a pointer (don't know if you copy the nodes internally). m
and σ
need to be kept for the moment. In the future we might generalize this, but right now this is the easiest so unify different NFFT libraries (which is the purpose of AbstractNFFTs
).
I kept the nodes in the plan for the moment. We already have a nodes! setter and I likely will add a nodes getter.
OK, then we need to make sure that the internal plan is destroyed and rebuilt whenever the nodes are set.
And unless I can get the nodes via a ccall it is better to store the array, which is actually just a pointer (don't know if you copy the nodes internally).
I'm storing a genuine copy of the nodes, but it is already re-arranged in the order in which it will be needed during the transforms (this gives a nice speedup); so I cannot really pass it back via a ccall
.
m and σ need to be kept for the moment. In the future we might generalize this, but right now this is the easiest so unify different NFFT libraries (which is the purpose of AbstractNFFTs).
Fine!
One thing I noticed: your types for f
and fHat
are AbstractArray
and StridedArray
, whereas the C code currently expects something that is stored in memory like a DenseArray
. I can extend the code to deal with StridedArray
objects, but passing a pointer to an AbstractArray
to the C code sounds very dangerous to me. We may have to constrain the types a bit more.
One thing I noticed: your types for f and fHat are AbstractArray and StridedArray, whereas the C code currently expects something that is stored in memory like a DenseArray. I can extend the code to deal with StridedArray objects, but passing a pointer to an AbstractArray to the C code sounds very dangerous to me. We may have to constrain the types a bit more.
yes indeed. will change before merging.
Have merged my branch. Thanks for the contribution!
Here is a plot of a local benchmark run:
This is on x86_64.
Which CPU do you have?
On M1Max it looks like this:
On the reference machine (AMD EPYC 7702 @ 2.0 GHz) its somewhere in-between.
Which CPU do you have?
This is a laptop with a Ryzen 7 4800H CPU.
The ducc curves in your plot almost look like there's some constant overhead, but I hve no idea where it could come from. You mentioned at some point that there might be CPU emulation going on; could this be the reason?
It might be interesting to look at the verbose output from one of your 1D runs, just to see where the time is spent.
This shows how the
ducc
NUFFT library could be accessed fromNFFT.jl
.ducc0_jll
has been merged intoYggdrasil
today; it is still in a very early stage. I thought it may be good to demonstrate the package as soon as possible to get helpful feedback ... I'm still extremely new to Julia and will make lots of beginner mistaks :-)Please let me know if you consider this kind of interface useful in principle, and if so, how it could be improved.