Open zhanglin-1993 opened 7 years ago
Here is the related code, include the matlab code: gradientpnorm.m and other 4 cuda codes. minp.zip
Hi @ilanzhang . I will try to have a look at this, but it is very likely I will not be able to until January.
If you want to debug it to try to find where NaNs appear, the best way would be printing some values of the image aftear each step in the cuda code, and triying to see which line exactly has the NaNs. That will help pinpoint the source of errors.
Thanks for your reply. I'll try it after finishing my job.
@ilanzhang any updates on this?
@AnderBiguri have been using the matlab version, and the cuda version haven't been fixed, i don't know where the wrong is.
@ilanzhang hum, it may be helpful if you show me the maths if you still need help. The TV norm minimization is in the end defined by a derivation of the discrete gradient of the TV norm. Do you have a generic solution for the discrete gradient of the P norm?
@AnderBiguri Here is the file. Thanks for your help! J20170702_p-norm calculation - to AnderBiguri.docx
Hi @ilanzhang,
Thanks for this! Interesting approach. You mention that you have submitted a paper based on this? If that is the case, then the best would be to hopefully get that accepted, then we can have a look at how to include it in TIGRE or properly make the CUDA work, if that is OK with you!
@AnderBiguri very glad that the paper been accepted. if there is time for you, could you add the above code to work in CUDA?
@AnderBiguri and another thing confusing me is that: I use matlab 2018b, VS2015 and CUDA 9.0 in Win7 ultimate now. after compile, it shows that VS2015 was comfigured for the compilation of C, while what show next is, error using mex, can not find the compiler supporting. Do you know what's wrong here?
@ilanzhang Congratulations! Could you send me the code by email and a small demo of reconstruction using it? That would help me greatly, as if you share it I plan to modify it to be able to run on multiple GPUs.
I'd say the most likely case of your error is that you did not install the C++ compiler when installing VS2015, its not set up by default. Here you have a detailed installation and debugging help: https://github.com/CERN/TIGRE/blob/frontispiece/Frontispiece/MATLAB_installation.md
@ilanzhang I know this is super late, but just letting you know I have not forgotten! Unfortunately implementing this for me in CUDA is something that is too much to take right now, so I am not sure when I will be able to do it.
@AnderBiguri That's so nice of you, I'm sure this part will not only useful for me, but also useful for others!
@AnderBiguri there's a small suggestion here: when you add those part some day, if the regularization part could be designed to be combined freely, for example, I could apply the L1 norm of the gradient, or L2 norm of the gradient, or L0.8 norm of the gradient, and even I can choose the (L1-L2) norm of the garient. there're some new theories for regularization those years, its expansibility will be better for TIGRE if it's designed so.
Hello: I'm trying to realize p-norm version of the ASD-POCS, and complete it with CUDA. the matlab code works well while the CUDA code comes kinds of errors all the time. There maybe comes NaN voxels, which you said maybe the problem of eps, but it worked nothing when I set eps=0.0001, and maybe came the problem of the input parameter couldn't be the single one, and sometimes, it reminded me that the MinP wasn't a available function although I had compiled and got the mexw64 file in the proper path. could you please tell me where the problem is? the only change of minP.mexw64 to the minTV.mexw64 is an additional parameter p. Here is the related files.
OS-ASD-POCS:
f=minimizeP(f0,dtvg,ng,p);
minimizeP.m:
minP.cpp
// First input should be x from (Ax=b), or the image. mxArray const * const image = prhs[0]; mwSize const numDims = mxGetNumberOfDimensions(image);
// Image should be dim 3 if (numDims!=3){ mexErrMsgIdAndTxt("err", "Image is not 3D"); } // Now that input is ok, parse it to C data types. float const const imgaux = static_cast<float const >(mxGetData(image)); const mwSize *size_img= mxGetDimensions(image); //get size of image
float img = (float)malloc(size_img[0] size_img[1] size_img[2]* sizeof(float));
for(int i=0;i<size_img[0]size_img[1]size_img[2];i++) img[i]=(float)imgaux[i];
// Allocte output image float imgout = (float)malloc(size_img[0] size_img[1] size_img[2]* sizeof(float)); // call C function with the CUDA denoising
const long imageSize[3]={size_img[0] ,size_img[1],size_img[2] }; pocs_p(img,imgout, alpha, imageSize, maxIter, p);
//prepareotputs plhs[0] = mxCreateNumericArray(3,size_img, mxSINGLE_CLASS, mxREAL); float mxImgout =(float) mxGetPr(plhs[0]);
memcpy(mxImgout,imgout,size_img[0] size_img[1] size_img[2]*sizeof(float)); //free memory free(img); free(imgout); }
ifndef POCS_P_HPP
define POCS_P_HPP
include "mex.h"
include "tmwtypes.h"
void pocs_p(const float img,float dst,float alpha,const long* image_size, int maxIter,float p);
endif
define MAXTHREADS 1024
include "POCS_P.hpp"
define cudaCheckErrors(msg) \
do { \ cudaError_t err = cudaGetLastError(); \ if (err != cudaSuccess) { \ mexPrintf("ERROR in: %s \n",msg);\ mexErrMsgIdAndTxt("err",cudaGetErrorString(__err));\ } \ } while (0)
// CUDA kernels //https://stackoverflow.com/questions/21332040/simple-cuda-kernel-optimization/21340927#21340927 global void divideArrayScalar(float vec,float scalar,const size_t n) { unsigned long long i = (blockIdx.x blockDim.x) + threadIdx.x; for(; i<n; i+=gridDim.xblockDim.x) { vec[i]/=scalar; } } global void multiplyArrayScalar(float vec,float scalar,const size_t n) { unsigned long long i = (blockIdx.x blockDim.x) + threadIdx.x; for(; i<n; i+=gridDim.xblockDim.x) { vec[i]=scalar; } } global void substractArrays(float vec,float vec2,const size_t n) { unsigned long long i = (blockIdx.x blockDim.x) + threadIdx.x; for(; i<n; i+=gridDim.x*blockDim.x) { vec[i]-=vec2[i]; } }
if (__CUDA_ARCH__ >= 300)
else
endif
if (__CUDA_ARCH__ >= 300)
else
endif
// main function void pocs_p(const float img,float dst,float alpha,const long* image_size, int maxIter,float p){