cabouman / svmbir

Fast code for parallel or fan beam tomographic reconstruction
BSD 3-Clause "New" or "Revised" License
21 stars 8 forks source link

Nans generated by recon (Bugzilla) #212

Closed cabouman closed 11 months ago

cabouman commented 3 years ago

The code generates nans in the recon. This seems to happen when the weights are small/zero or iterations are run to very precise convergence.

The problem can be reproduce by setting weight_type="transmission" the 3D Shepp Logan demo.

recon = svmbir.recon(sino, angles, T=T, p=p, sharpness=sharpness, snr_db=snr_db, weight_type="transmission")

tbalke commented 3 years ago

The density of the phantom is excessively high. This makes the sino too large and thus the weights are in some cases zero.

I would recommend scaling the phantom like this:

phantom = svmbir.phantom.gen_shepp_logan_3d(
    num_rows_cols,num_rows_cols,num_slices) / num_rows_cols * 5

which makes the sino max out at about 5 no matter how large the image size is chosen.

cabouman commented 3 years ago

OK, here is my (really Thilo's) hypothesis as to the source of the problem:

The fundamental update for ICD is x_s <- x_s - (\theta_1/\theta_2)

The problem might be occurring because \theta==0 or very small or negative. In this case, the reconstruction could diverge particularly if there is no positivity constraint.

This can occur because the weights are all 0 and/or a pixel or group of pixels does not project to any measurements,, and/or has no neighbors. (Although I'm not sure why the prior term wouldn't regularize this.)

cabouman commented 3 years ago

OK, after much thought I've concluded that we should focus on the proximal map mode. We should never have a problem with NaNs under these conditions.

Can we reproduce NaNs for the prox mode? This might be useful: https://www.cplusplus.com/reference/cmath/isnormal/

dyang37 commented 3 years ago

OK, after much thought I've concluded that we should focus on the proximal map mode. We should never have a problem with NaNs under these conditions.

Can we reproduce NaNs for the prox mode?

So far I have not seen NaNs in MACE reconstruction (which uses prox mode of recon function) using either svmbir or mbircone. What's the reason that we want to focus on prox mode for now? Maybe we can try MACE with 3D shepp Logan with transmission weight and see if the NaNs still occur during prox map estimation.

dyang37 commented 3 years ago

So I managed to trace the problem to the value of a specific variable. It seems that when zero weight is provided, A_padd_Tranpose_pointer[0] is -NaN in functionsuper_voxel_recon() (inside recon3d.c) . This causes the NaN values of THETA1 and THETA2 when updated by the code from line 778 to 790 in recon3d.c.

Here's the error output:

Reading system matrix...
Projecting image...
Reconstructing...

********************************Big problems with update!***********************************
 NaN observed when calculating tempTHETA1 in  recon3d !!!!!!!!
 t=0, A_padd_Transpose=-nan, WTransposeArray=0.000000,
 THETA1=0.000000; THETA2=-nan, tempTHETA1=-nan; tempTHETA2=0.000000

Here's the updated recon3d.c code starting from line 778:

                for(t=0;t<myCount*pieceLength;t++)
                {   /* summing over voxels which are not skipped or masked*/
                    tempTHETA1 += A_padd_Tranpose_pointer[t]*WTransposeArrayPointer[t]*ETransposeArrayPointer[t];
                    tempTHETA2 += A_padd_Tranpose_pointer[t]*WTransposeArrayPointer[t]*A_padd_Tranpose_pointer[t];
                    if ( !isfinite(tempTHETA1) || !isfinite(tempTHETA2)) {
                        printf("\n\n********************************Big problems with update!***********************************");
                        printf("\n NaN observed when calculating tempTHETA1 in  recon3d !!!!!!!!"
                               "\n t=%d, A_padd_Transpose=%f, WTransposeArray=%f,"
                               "\n THETA1=%f; THETA2=%f, tempTHETA1=%f; tempTHETA2=%f\n",
                                t, A_padd_Tranpose_pointer[t], WTransposeArrayPointer[t],
                                THETA1[currentSlice],THETA2[currentSlice],tempTHETA1,tempTHETA2);
                        exit(1);
                    }

                }
                THETA1[currentSlice]+=tempTHETA1;
                THETA2[currentSlice]+=tempTHETA2;

(I cannot find a way to attach the script with the issue. So please copy paste the code and recompile on your end to reproduce the error messages above).

sjkisner commented 3 years ago

@dyang37 There's something very odd in that. I'm wondering if your printf arguments match the specifiers. Can you provide the outer python script you're running?

From where you have the printf statement, I can't see how you would get A_padd_Transpose[0]=-nan, and tempTHETA2=0.000000

A_padd_Tranpose_pointer comes from the A matrix. It should be independent of the weight setting, so I'd like to confirm your observation.

dyang37 commented 3 years ago

Hi Jordan, Here are the test script as well as the modified recon3d.c file. The test script is the same as 3D shepp logan demo script except I set weights to be a 0 array. Diyu


From: Sherman J Kisner @.> Sent: Monday, November 8, 2021 3:23 PM To: cabouman/svmbir @.> Cc: Diyu Yang @.>; Mention @.> Subject: Re: [cabouman/svmbir] Nans generated by recon (Issue #212)

@dyang37https://github.com/dyang37 There's something very odd in that. I'm wondering if your printf arguments match the specifiers. Can you provide the outer python script you're running?

From where you have the printf statement, I can't see how you would get A_padd_Transpose[0]=-nan, and tempTHETA2=0.000000

A_padd_Tranpose_pointer comes from the A matrix. It should be independent of the weight setting, so I'd like to confirm your observation.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cabouman/svmbir/issues/212#issuecomment-963542269, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB2Q7OGXN3Y7MJY7P7WM7XTULAWUTANCNFSM5HOWI5OQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

include

include

include

include

include

ifndef MSVC / not included in MS Visual C++ /

#include <sys/time.h>

endif

include "MBIRModularDefs.h"

include "MBIRModularUtils.h"

include "allocate.h"

include "icd3d.h"

include "heap.h"

include "A_comp.h"

include "initialize.h"

include "recon3d.h"

define TEST

//#define COMP_COST //#define COMP_RMSE

/ Internal functions / void super_voxel_recon(int jj,struct SVParams svpar,unsigned long NumUpdates,float totalValue,float totalChange,int iter, char phaseMap,long order,int indexList,float weight,float sinoerr, struct AValues_char *A_Padded_Map,float Aval_max_ptr,struct heap_node headNodeArray, struct SinoParams3DParallel sinoparams,struct ReconParams reconparams,struct ParamExt param_ext,float image, struct ImageParams3D imgparams, float proximalmap, float voxelsBuffer1,float voxelsBuffer2,char group_array,int group_id); void SVproject(float proj,float image,struct AValues_char *A_Padded_Map,float Aval_max_ptr, struct ImageParams3D imgparams,struct SinoParams3DParallel sinoparams,struct SVParams svpar,char backproject_flag); void coordinateShuffle(int order1, int order2,int len); void three_way_shuffle(long order1, char order2, struct heap_node headNodeArray,int len); float MAPCostFunction3D(float x,float e,float w,struct ImageParams3D imgparams,struct SinoParams3DParallel sinoparams, struct ReconParams reconparams,struct ParamExt param_ext);

void MBIRReconstruct( float image, float sino, float weight, float proj_init, float proximalmap, struct ImageParams3D imgparams, struct SinoParams3DParallel sinoparams, struct ReconParams reconparams, char Amatrix_fname, char verboseLevel) { float sinoerr, proximalmap_loc=NULL; int i,j,jj,p,t,iter,it_print=1; size_t k;

ifndef MSVC / not included in MS Visual C++ /

struct timeval tm1,tm2;
#endif

/* image/sino/recon parameters */
int Nx = imgparams.Nx;
int Ny = imgparams.Ny;
int Nz = imgparams.Nz;
int Nxy = Nx*Ny;
int Nvc = sinoparams.NViews * sinoparams.NChannels;
int MaxIterations = reconparams.MaxIterations;
float StopThreshold = reconparams.StopThreshold;

/* Initialize/allocate SV parameters */
struct AValues_char **A_Padded_Map;
float *Aval_max_ptr;
struct SVParams svpar;
initSVParams(&svpar, imgparams, sinoparams);
int Nsv = svpar.Nsv;
int SVLength = svpar.SVLength;
int SV_per_Z = svpar.SV_per_Z;
int SVsPerRow = svpar.SVsPerRow;

/* Activate proximal map mode if given as input */
if(proximalmap != NULL)
{
    reconparams.ReconType = MBIR_MODULAR_RECONTYPE_PandP;
    /* 'image' is reconstructed in place, so if proximal map is the same array, make a local copy */
    if(proximalmap == image)
    {
        proximalmap_loc = (float *) mget_spc((size_t)Nx*Ny*Nz,sizeof(float));
        for(k=0; k<(size_t)Nx*Ny*Nz; k++)
            proximalmap_loc[k] = proximalmap[k];
    }
    else
        proximalmap_loc = proximalmap;
}

/* print summary to stdout */
if(verboseLevel>1)
{
    fprintf(stdout,"MBIRReconstruct() -- build time: %s, %s\n", __DATE__, __TIME__);
    printSinoParams3DParallel(&sinoparams);
    printImageParams3D(&imgparams);
    if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_QGGMRF_3D)
        printReconParamsQGGMRF3D(&reconparams);
    if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_PandP)
        printReconParamsPandP(&reconparams);
}

/* Allocate and generate recon mask based on ROIRadius */
char * ImageReconMask = GenImageReconMask(&imgparams);

/* Read/compute/write System Matrix */
A_Padded_Map = (struct AValues_char **)multialloc(sizeof(struct AValues_char),2,Nsv,(2*SVLength+1)*(2*SVLength+1));
Aval_max_ptr = (float *) mget_spc(Nx*Ny,sizeof(float));
if(Amatrix_fname != NULL)
{
    if(verboseLevel)
        fprintf(stdout,"Reading system matrix...\n");
    readAmatrix(Amatrix_fname, A_Padded_Map, Aval_max_ptr, &imgparams, &sinoparams, svpar);
}
else
{
    if(verboseLevel)
        fprintf(stdout,"Computing system matrix...\n");
    A_comp(A_Padded_Map,Aval_max_ptr,svpar,&sinoparams,ImageReconMask,&imgparams);
}

/* Project image for sinogram error */
if(proj_init != NULL)
    sinoerr = proj_init;
else
{
    if(verboseLevel)
        fprintf(stdout,"Projecting image...\n");
    sinoerr = (float *) mget_spc((size_t)Nz*Nvc,sizeof(float));
    SVproject(sinoerr,image,A_Padded_Map,Aval_max_ptr,imgparams,sinoparams,svpar,0);
}
for(k=0; k<(size_t)Nz*Nvc; k++)
    sinoerr[k] = sino[k]-sinoerr[k];

/* Recon parameters */
NormalizePriorWeights3D(&reconparams);
struct ParamExt param_ext;
param_ext.pow_sigmaX_p = powf(reconparams.SigmaX,reconparams.p);
param_ext.pow_sigmaX_q = powf(reconparams.SigmaX,reconparams.q);
param_ext.pow_T_qmp    = powf(reconparams.T,reconparams.q - reconparams.p);
param_ext.SigmaXsq = reconparams.SigmaX * reconparams.SigmaX;

float *voxelsBuffer1;  /* the first N entries are the voxel values.  */
float *voxelsBuffer2;
unsigned long NumUpdates=0;
float totalValue=0,totalChange=0,equits=0;
float avg_update=0,avg_update_rel=0;
float c_ratio=0.07;
float convergence_rho=0.7;
int rep_num=(int)ceil(1/(4*c_ratio*convergence_rho));
struct heap priorityheap;
initialize_heap(&priorityheap);
long *order;
char *phaseMap, **group_id_list;

int NumMaskVoxels=0;
for(j=0;j<Nxy;j++)
if(ImageReconMask[j])
    NumMaskVoxels++;

#ifdef COMP_RMSE
    struct Image3D Image_ref;
    Image_ref.imgparams.Nx = imgparams.Nx;
    Image_ref.imgparams.Ny = imgparams.Ny;
    Image_ref.imgparams.Nz = imgparams.Nz;
    Image_ref.imgparams.FirstSliceNumber = imgparams.FirstSliceNumber;
    Image_ref.imgparams.NumSliceDigits = imgparams.NumSliceDigits;
    AllocateImageData3D(&Image_ref);
    ReadImage3D("ref/ref",&Image_ref);

    float ** image_ref = Image_ref.image;
    double rms_err=0,rms_val=0;
    int jz, Nz0=0, Nz1=Nz;
    for(jz=Nz0; jz<Nz1; jz++)
    for(j=0; j<Nxy; j++)
    if(ImageReconMask[j]) {
        rms_val += image_ref[jz][j]*image_ref[jz][j];
        rms_err += (image[(size_t)jz*Nxy+j]-image_ref[jz][j])*(image[(size_t)jz*Nxy+j]-image_ref[jz][j]);
    }
    rms_val = sqrt(rms_val/((float)NumMaskVoxels*(Nz1-Nz0)));
    rms_err = sqrt(rms_err/((float)NumMaskVoxels*(Nz1-Nz0)));
    FILE *fp_mse=fopen("rmse.txt","w");
    fprintf(fp_mse,"equits|rms_err|rms_val|rms_err/rms_val\n");
    fprintf(fp_mse,"%.2f %g %g %g\n",equits,rms_err,rms_val,rms_err/rms_val);
#endif

order = (long *) mget_spc(Nsv*SV_per_Z,sizeof(long));
phaseMap = (char *) mget_spc(Nsv*SV_per_Z,sizeof(char));
group_id_list = (char **) multialloc(sizeof(char),2,SV_per_Z,4);

/* Order of pixel updates need NOT be raster order, just initialize */
t=0;
for(p=0;p<Nz;p+=svpar.SVDepth)
for(i=0;i<Ny;i+=(SVLength*2-svpar.overlap))
for(j=0;j<Nx;j+=(SVLength*2-svpar.overlap))
{
    order[t]=(long)p*Nxy+i*Nx+j;  /* order is the first voxel coordinate, not the center */
    t++;
}

#pragma omp parallel for private(jj) schedule(dynamic)
for(i=0;i<SV_per_Z;i++)
for(jj=0;jj<Nsv;jj++)
{
    if((jj/SVsPerRow)%2==0)
    {
        if((jj%SVsPerRow)%2==0)
            phaseMap[i*Nsv+jj]=0;
        else
            phaseMap[i*Nsv+jj]=1;
    }
    else
    {
        if((jj%SVsPerRow)%2==0)
            phaseMap[i*Nsv+jj]=2;
        else
            phaseMap[i*Nsv+jj]=3;
    }
}

for(i=0;i<SV_per_Z;i++) {
    if(i%4==0){
        group_id_list[i][0]=0;
        group_id_list[i][1]=3;
        group_id_list[i][2]=1;
        group_id_list[i][3]=2;
    }
    else if(i%4==1) {
        group_id_list[i][0]=3;
        group_id_list[i][1]=0;
        group_id_list[i][2]=2;
        group_id_list[i][3]=1;
    }
    else if(i%4==2) {
        group_id_list[i][0]=1;
        group_id_list[i][1]=2;
        group_id_list[i][2]=3;
        group_id_list[i][3]=0;
    }
    else{
        group_id_list[i][0]=2;
        group_id_list[i][1]=1;
        group_id_list[i][2]=0;
        group_id_list[i][3]=3;
    }
}
#ifdef TEST
srand(0);
#else
srand(time(NULL));
#endif

struct heap_node *headNodeArray;
headNodeArray = (struct heap_node *) mget_spc(Nsv*SV_per_Z,sizeof(struct heap_node));

for(i=0;i<SV_per_Z;i++)
for(jj=0;jj<Nsv;jj++)
{
    headNodeArray[i*Nsv+jj].pt=i*Nsv+jj;
    headNodeArray[i*Nsv+jj].x=0.0;
}
int indexList_size=(int) Nsv*SV_per_Z*4*c_ratio*(1-convergence_rho);
int * indexList = (int *) mget_spc(indexList_size,sizeof(int));

#ifdef ICC
    voxelsBuffer1 = (float *)_mm_malloc(Nxy*sizeof(float),64);
    voxelsBuffer2 = (float *)_mm_malloc(Nxy*sizeof(float),64);
#else
    voxelsBuffer1 = (float *) malloc(Nxy*sizeof(float));
    voxelsBuffer2 = (float *) malloc(Nxy*sizeof(float));
    //voxelsBuffer1 = (float *) aligned_alloc(64,Nxy*sizeof(float));
    //voxelsBuffer2 = (float *) aligned_alloc(64,Nxy*sizeof(float));
    //voxelsBuffer1 = (float *) _aligned_malloc(Nxy*sizeof(float),64);
    //voxelsBuffer2 = (float *) _aligned_malloc(Nxy*sizeof(float),64);
#endif

for(i=0;i<Nxy;i++) {
    voxelsBuffer1[i]=0;
    voxelsBuffer2[i]=0;
}

//coordinateShuffle(&order[0],&phaseMap[0],Nsv*SV_per_Z);
long tmp_long;
char tmp_char;
for(i=0; i<Nsv*SV_per_Z-1; i++)
{
    j = i + (rand() % (Nsv*SV_per_Z-i));
    tmp_long = order[j];
    order[j] = order[i];
    order[i] = tmp_long;
    tmp_char = phaseMap[j];
    phaseMap[j] = phaseMap[i];
    phaseMap[i] = tmp_char;
}

iter=0;
char stop_FLAG=0;
int startIndex=0;
int endIndex=0;

if(verboseLevel) {
    fprintf(stdout,"Reconstructing...\n");
    #ifndef MSVC    /* not included in MS Visual C++ */
    gettimeofday(&tm1,NULL);
    #endif
}

#pragma omp parallel
{
    while(stop_FLAG==0 && equits<MaxIterations && iter<100*MaxIterations)
    {
        #pragma omp single
        {
            if(iter==0)
            {
                startIndex=0;
                endIndex=Nsv*SV_per_Z;
            }
            else
            {
                if((iter-1)%(2*rep_num)==0 && iter!=1)
                    three_way_shuffle(&order[0],&phaseMap[0],&headNodeArray[0],Nsv*SV_per_Z);

                if(iter%2==1)
                {
                    priorityheap.size=0;
                    for(jj=0;jj<Nsv*SV_per_Z;jj++){
                        heap_insert(&priorityheap, &(headNodeArray[jj]));
                    }
                    startIndex=0;
                    endIndex=indexList_size;

                    for(i=0;i<endIndex;i++) {
                        struct heap_node tempNode;
                        get_heap_max(&priorityheap, &tempNode);
                        indexList[i]=tempNode.pt;
                    }
                }
                else {
                    startIndex=((iter-2)/2)%rep_num*Nsv*SV_per_Z/rep_num;
                    endIndex=(((iter-2)/2)%rep_num+1)*Nsv*SV_per_Z/rep_num;
                }
            }
        }

        int group=0;

        for (group = 0; group < 4; group++)
        {
            #pragma omp for schedule(dynamic)  reduction(+:NumUpdates) reduction(+:totalValue) reduction(+:totalChange)
            for (jj = startIndex; jj < endIndex; jj+=1)
                super_voxel_recon(jj,svpar,&NumUpdates,&totalValue,&totalChange,iter,
                        &phaseMap[0],order,&indexList[0],weight,sinoerr,A_Padded_Map,&Aval_max_ptr[0],
                        &headNodeArray[0],sinoparams,reconparams,param_ext,image,imgparams,proximalmap_loc,
                        voxelsBuffer1,voxelsBuffer2,&group_id_list[0][0],group);
        }

        #pragma omp single
        {
            avg_update=avg_update_rel=0.0;
            if(NumUpdates>0) {
                avg_update = totalChange/NumUpdates;
                float avg_value = totalValue/NumUpdates;
                avg_update_rel = avg_update/avg_value * 100;
                //printf("avg_update %f, avg_value %f, avg_update_rel %f\n",avg_update,avg_value,avg_update_rel);
            }
            #ifdef COMP_COST
            float cost = MAPCostFunction3D(image,sinoerr,weight,imgparams,sinoparams,reconparams,param_ext);
            fprintf(stdout, "it %d cost = %-15f, avg_update %f \n", iter, cost, avg_update);
            #endif

            if (avg_update_rel < StopThreshold && (endIndex!=0))
                stop_FLAG = 1;

            iter++;
            equits += (float)NumUpdates/((float)NumMaskVoxels*Nz);

            if(verboseLevel && equits > it_print) {
                fprintf(stdout,"\titeration %d, average change %.4f %%\n",it_print,avg_update_rel);
                it_print++;
            }

            #ifdef COMP_RMSE
                rms_err=0;
                for(jz=Nz0; jz<Nz1; jz++)
                for(j=0; j<Nxy; j++)
                if(ImageReconMask[j])
                    rms_err += (image[(size_t)jz*Nxy+j]-image_ref[jz][j])*(image[(size_t)jz*Nxy+j]-image_ref[jz][j]);
                rms_err = sqrt(rms_err/((float)NumMaskVoxels*(Nz1-Nz0)));
                fprintf(fp_mse,"%.2f %g %g %g\n",equits,rms_err,rms_val,rms_err/rms_val);
            #endif

            NumUpdates=0;
            totalValue=0;
            totalChange=0;
        }
    }
}

if(verboseLevel)
{
    if(StopThreshold <= 0)
        fprintf(stdout,"\tNo stopping condition--running fixed iterations\n");
    else if(stop_FLAG == 1)
        fprintf(stdout,"\tReached stopping condition\n");
    else
        fprintf(stdout,"\tWarning: Didn't reach stopping condition\n");
    if(verboseLevel>1)
    {
        fprintf(stdout,"\tEquivalent iterations = %.1f, (non-homogeneous iterations = %d)\n",equits,iter);
        fprintf(stdout,"\tAverage update in last iteration (relative) = %f %%\n",avg_update_rel);
        fprintf(stdout,"\tAverage update in last iteration (magnitude) = %.4g\n",avg_update);
    }
    #ifndef MSVC    /* not included in MS Visual C++ */
    gettimeofday(&tm2,NULL);
    unsigned long long tt = 1000 * (tm2.tv_sec - tm1.tv_sec) + (tm2.tv_usec - tm1.tv_usec) / 1000;
    printf("\tReconstruction time = %llu ms (iterations only)\n", tt);
    #endif
}

/* If initial projection was supplied, update to return final projection */
if(proj_init != NULL)
{
    for(k=0; k<(size_t)Nz*Nvc; k++)
        proj_init[k] = sino[k]-sinoerr[k];
}
else
    free((void *)sinoerr);

/* If local copy of proximal map was made, free it */
if(proximalmap == image)
    free((void *)proximalmap_loc);

#ifdef ICC
    _mm_free((void *)voxelsBuffer1);
    _mm_free((void *)voxelsBuffer2);
#else
    free((void *)voxelsBuffer1);
    free((void *)voxelsBuffer2);
    //_aligned_free((void *)voxelsBuffer1);
    //_aligned_free((void *)voxelsBuffer2);
#endif

free((void *)headNodeArray);
free_heap((void *)&priorityheap);
free((void *)order);
free((void *)phaseMap);
multifree(group_id_list,2);
free((void *)indexList);

#ifdef COMP_RMSE
    FreeImageData3D(&Image_ref);
    fclose(fp_mse);
#endif

/* Free SV memory */
for(i=0;i<Nsv;i++) {
    free((void *)svpar.bandMinMap[i].bandMin);
    free((void *)svpar.bandMaxMap[i].bandMax);
}
free((void *)svpar.bandMinMap);
free((void *)svpar.bandMaxMap);

/* Free system matrix */
for(i=0;i<Nsv;i++)
for(j=0;j<(2*SVLength+1)*(2*SVLength+1);j++)
if(A_Padded_Map[i][j].length>0)
{
    free((void *)A_Padded_Map[i][j].val);
    free((void *)A_Padded_Map[i][j].pieceWiseMin);
    free((void *)A_Padded_Map[i][j].pieceWiseWidth);
}
multifree(A_Padded_Map,2);
free((void *)Aval_max_ptr);
free((void *)ImageReconMask);

} / END MBIRReconstruct() /

void super_voxel_recon( int jj, struct SVParams svpar, unsigned long NumUpdates, float totalValue, float totalChange, int iter, char phaseMap, long order, int indexList, float weight, float sinoerr, struct AValues_char * A_Padded_Map, float Aval_max_ptr, struct heap_node headNodeArray, struct SinoParams3DParallel sinoparams, struct ReconParams reconparams, struct ParamExt param_ext, float image, struct ImageParams3D imgparams, float proximalmap, float voxelsBuffer1, float voxelsBuffer2, char group_array, int group_id) { int jy,jx,p,i,q,t,j,currentSlice,startSlice; int SV_depth_modified; int NumUpdates_loc=0; float totalValue_loc=0,totalChange_loc=0;

int Nx = imgparams.Nx;
int Ny = imgparams.Ny;
int Nz = imgparams.Nz;
int Nxy = Nx*Ny;
int Nvc = sinoparams.NViews * sinoparams.NChannels;
char PositivityFlag = reconparams.Positivity;

int SVLength = svpar.SVLength;
int overlappingDistance = svpar.overlap;
int SV_depth = svpar.SVDepth;
int SVsPerRow = svpar.SVsPerRow;
struct minStruct * bandMinMap = svpar.bandMinMap;
struct maxStruct * bandMaxMap = svpar.bandMaxMap;
int pieceLength = svpar.pieceLength;
int NViewSets = sinoparams.NViews/pieceLength;

int jj_new;
if(iter%2==0)
    jj_new=jj;
else
    jj_new=indexList[jj];

startSlice = order[jj_new] / Nx / Ny;
jy = (order[jj_new] - startSlice* Nx * Ny) / Nx;
jx = (order[jj_new] - startSlice* Nx * Ny) % Nx;

if(phaseMap[jj_new]!=group_array[startSlice/SV_depth*4+group_id])
    return;

if((startSlice+SV_depth)>Nz)
    SV_depth_modified=Nz-startSlice;
else
    SV_depth_modified=SV_depth;

int SVPosition=jy/(2*SVLength-overlappingDistance)*SVsPerRow+jx/(2*SVLength-overlappingDistance);

int countNumber=0;  /*XW: the number of voxels chosen for a certain radius of circle*/
int radius =SVLength;   /*XW: choose the circle radius*/
int coordinateSize=1;   /*XW: CoordinateSize is the size of a minimum square enclosing the circle. For the baseline code, coordinateSize=1 because only 1 voxel*/
/*XW: imagine that this is a minimum square enclosing the circle. The square
 * "touches" the circle on 4 coordinate points. CoordianteSize is the possible
 * maximum number of voxels for a certain radius of circle */
if(radius!=0)
    coordinateSize=(2*radius+1)*(2*radius+1);
int * k_newCoordinate = (int *) mget_spc(coordinateSize,sizeof(int));
int * j_newCoordinate = (int *) mget_spc(coordinateSize,sizeof(int));
int j_newAA=0;
int k_newAA=0;
int voxelIncrement=0;

/*XW: choosing the voxels locations in a circle*/
for(j_newAA=jy;j_newAA<=(jy+2*radius);j_newAA++)
for(k_newAA=jx;k_newAA<=(jx+2*radius);k_newAA++)
{
    if(j_newAA>=0 && k_newAA >=0 && j_newAA <Ny && k_newAA < Nx)
    {
        if(A_Padded_Map[SVPosition][voxelIncrement].length >0) {
            j_newCoordinate[countNumber]=j_newAA;
            k_newCoordinate[countNumber]=k_newAA;
            countNumber++;
        }
    }
    voxelIncrement++;
}

/*XW: if no voxel chosen, we skip this loop iteration*/
if(countNumber==0)
{
    free((void *)k_newCoordinate);
    free((void *)j_newCoordinate);
    return;
}

coordinateShuffle(&j_newCoordinate[0],&k_newCoordinate[0],countNumber);

/*XW: for a supervoxel, bandMin records the starting position of the sinogram band at each view*/
/*XW: for a supervoxel, bandMax records the end position of the sinogram band at each view */
channel_t * bandMin = (channel_t *) mget_spc(sinoparams.NViews,sizeof(channel_t));
channel_t * bandMax = (channel_t *) mget_spc(sinoparams.NViews,sizeof(channel_t));
channel_t * bandWidthTemp = (channel_t *) mget_spc(sinoparams.NViews,sizeof(channel_t));
channel_t * bandWidth = (channel_t *) mget_spc(NViewSets,sizeof(channel_t));

#ifdef ICC
_intel_fast_memcpy(&bandMin[0],&bandMinMap[SVPosition].bandMin[0],sizeof(channel_t)*(sinoparams.NViews));
_intel_fast_memcpy(&bandMax[0],&bandMaxMap[SVPosition].bandMax[0],sizeof(channel_t)*(sinoparams.NViews));
#else
memcpy(&bandMin[0],&bandMinMap[SVPosition].bandMin[0],sizeof(channel_t)*(sinoparams.NViews));
memcpy(&bandMax[0],&bandMaxMap[SVPosition].bandMax[0],sizeof(channel_t)*(sinoparams.NViews));
#endif

//#pragma vector aligned
for(p=0;p< sinoparams.NViews;p++)
    bandWidthTemp[p]=bandMax[p]-bandMin[p];

for (p = 0; p < NViewSets; p++)
{
    int bandWidthMax=bandWidthTemp[p*pieceLength];
    for(t=0;t<pieceLength;t++){
        if(bandWidthTemp[p*pieceLength+t]>bandWidthMax)
            bandWidthMax=bandWidthTemp[p*pieceLength+t];
    }
    bandWidth[p]=bandWidthMax;
}

float ** newWArray = (float **)malloc(sizeof(float *) * NViewSets);
float ** newEArray = (float **)malloc(sizeof(float *) * NViewSets);
float ** CopyNewEArray = (float **)malloc(sizeof(float *) * NViewSets);

for (p = 0; p < NViewSets; p++) {
    newWArray[p] = (float *)malloc(sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
    newEArray[p] = (float *)malloc(sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
    CopyNewEArray[p] = (float *)malloc(sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
}

float *newWArrayPointer=&newWArray[0][0];
float *newEArrayPointer=&newEArray[0][0];

/*XW: copy the interlaced we into the memory buffer*/
for (p = 0; p < NViewSets; p++)
{
    newWArrayPointer=&newWArray[p][0];
    newEArrayPointer=&newEArray[p][0];
    for(i=0;i<SV_depth_modified;i++)
    for(q=0;q<pieceLength;q++)
    {
        #ifdef ICC
        _intel_fast_memcpy(newWArrayPointer,&weight[(startSlice+i)*Nvc+p*pieceLength*sinoparams.NChannels+q*sinoparams.NChannels+bandMin[p*pieceLength+q]],sizeof(float)*(bandWidth[p]));
        _intel_fast_memcpy(newEArrayPointer,&sinoerr[(startSlice+i)*Nvc+p*pieceLength*sinoparams.NChannels+q*sinoparams.NChannels+bandMin[p*pieceLength+q]],sizeof(float)*(bandWidth[p]));
        #else
        memcpy(newWArrayPointer,&weight[(startSlice+i)*Nvc+p*pieceLength*sinoparams.NChannels+q*sinoparams.NChannels+bandMin[p*pieceLength+q]],sizeof(float)*(bandWidth[p]));
        memcpy(newEArrayPointer,&sinoerr[(startSlice+i)*Nvc+p*pieceLength*sinoparams.NChannels+q*sinoparams.NChannels+bandMin[p*pieceLength+q]],sizeof(float)*(bandWidth[p]));
        #endif
        newWArrayPointer+=bandWidth[p];
        newEArrayPointer+=bandWidth[p];
    }
}

for (p = 0; p < NViewSets; p++)
{
    #ifdef ICC
    _intel_fast_memcpy(&CopyNewEArray[p][0],&newEArray[p][0],sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
    #else
    memcpy(&CopyNewEArray[p][0],&newEArray[p][0],sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
    #endif
}

float ** newWArrayTransposed = (float **)malloc(sizeof(float *) * NViewSets);
float ** newEArrayTransposed = (float **)malloc(sizeof(float *) * NViewSets);

for (p = 0; p < NViewSets; p++)
{
    newWArrayTransposed[p] = (float *)malloc(sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
    newEArrayTransposed[p] = (float *)malloc(sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
}

float *WTransposeArrayPointer=&newWArrayTransposed[0][0];
float *ETransposeArrayPointer=&newEArrayTransposed[0][0];

for (p = 0; p < NViewSets; p++)
for(currentSlice=0;currentSlice<(SV_depth_modified);currentSlice++) 
{
    WTransposeArrayPointer=&newWArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
    ETransposeArrayPointer=&newEArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
    newEArrayPointer=&newEArray[p][currentSlice*bandWidth[p]*pieceLength];
    newWArrayPointer=&newWArray[p][currentSlice*bandWidth[p]*pieceLength];
    for(q=0;q<bandWidth[p];q++)
    {
        #pragma vector aligned
        for(t=0;t<pieceLength;t++)
        {
            ETransposeArrayPointer[q*pieceLength+t]=newEArrayPointer[bandWidth[p]*t+q];
            WTransposeArrayPointer[q*pieceLength+t]=newWArrayPointer[bandWidth[p]*t+q];
        }
    }
}

WTransposeArrayPointer=&newWArrayTransposed[0][0];
ETransposeArrayPointer=&newEArrayTransposed[0][0];
newEArrayPointer=&newEArray[0][0];

for (p = 0; p < NViewSets; p++)
    free((void *)newWArray[p]);

free((void **)newWArray);

/* Turn off zero-skipping for 1st iteration */
char zero_skip_enable=0;  // 1: enable, 0: disable
if(iter>0 && PositivityFlag)
    zero_skip_enable=1;

/*XW: the start of the loop to compute theta1, theta2*/
float * tempV = (float *) mget_spc(SV_depth_modified,sizeof(float));
float * tempProxMap = (float *) mget_spc(SV_depth_modified,sizeof(float));
float ** neighbors = (float **) multialloc(sizeof(float),2,SV_depth_modified,10);
char * zero_skip_FLAG = (char *) mget_spc(SV_depth_modified,sizeof(char));
float * diff = (float *) mget_spc(SV_depth_modified,sizeof(float));
float * THETA1 = (float *) get_spc(SV_depth_modified,sizeof(float));
float * THETA2 = (float *) get_spc(SV_depth_modified,sizeof(float));

for(i=0;i<countNumber;i++)
{
    const short j_new = j_newCoordinate[i];   /*XW: get the voxel's x,y location*/
    const short k_new = k_newCoordinate[i];
    float Aval_max = Aval_max_ptr[j_new*Nx+k_new];

    for(p=0;p<SV_depth_modified;p++)
        THETA1[p]=THETA2[p]=0.0;

    int theVoxelPosition=(j_new-jy)*(2*SVLength+1)+(k_new-jx);
    unsigned char * A_padd_Tranpose_pointer = &A_Padded_Map[SVPosition][theVoxelPosition].val[0];

    for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
    {
        tempV[currentSlice] = (float)(image[(size_t)(startSlice+currentSlice)*Nxy + j_new*Nx+k_new]); /* current voxel value */

        zero_skip_FLAG[currentSlice] = 0;

        if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_QGGMRF_3D)
        {
            ExtractNeighbors3D(&neighbors[currentSlice][0],k_new,j_new,&image[(size_t)(startSlice+currentSlice)*Nxy],imgparams);

            if((startSlice+currentSlice)==0)
                neighbors[currentSlice][8]=voxelsBuffer1[j_new*Nx+k_new];
            else
                neighbors[currentSlice][8]=image[(size_t)(startSlice+currentSlice-1)*Nxy + j_new*Nx+k_new];

            if((startSlice+currentSlice)<(Nz-1))
                neighbors[currentSlice][9]=image[(size_t)(startSlice+currentSlice+1)*Nxy + j_new*Nx+k_new];
            else
                neighbors[currentSlice][9]=voxelsBuffer2[j_new*Nx+k_new];

            if(zero_skip_enable)
            if(tempV[currentSlice] == 0.0)
            {
                zero_skip_FLAG[currentSlice] = 1;
                for (j = 0; j < 10; j++)
                {
                    if (neighbors[currentSlice][j] != 0.0)
                    {
                        zero_skip_FLAG[currentSlice] = 0;
                        break;
                    }
                }
            }
        }
        if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_PandP)
            tempProxMap[currentSlice] = proximalmap[(startSlice+currentSlice)*Nxy + j_new*Nx+k_new];
    }

    A_padd_Tranpose_pointer = &A_Padded_Map[SVPosition][theVoxelPosition].val[0];
    for(p=0;p<NViewSets;p++)
    {
        const int myCount=A_Padded_Map[SVPosition][theVoxelPosition].pieceWiseWidth[p];
        const int pieceMin=A_Padded_Map[SVPosition][theVoxelPosition].pieceWiseMin[p];
        #pragma vector aligned
        for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
        if(zero_skip_FLAG[currentSlice] == 0)
        {
            WTransposeArrayPointer=&newWArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
            ETransposeArrayPointer=&newEArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
            WTransposeArrayPointer+=pieceMin*pieceLength;
            ETransposeArrayPointer+=pieceMin*pieceLength;
            float tempTHETA1=0.0;
            float tempTHETA2=0.0;
            //Not finding evidence this makes a difference --SJK
            //Deprecated by Intel anyway
            //#pragma vector aligned
            //#pragma simd reduction(+:tempTHETA2,tempTHETA1)
            for(t=0;t<myCount*pieceLength;t++)
            {   /* summing over voxels which are not skipped or masked*/
                tempTHETA1 += A_padd_Tranpose_pointer[t]*WTransposeArrayPointer[t]*ETransposeArrayPointer[t];
                tempTHETA2 += A_padd_Tranpose_pointer[t]*WTransposeArrayPointer[t]*A_padd_Tranpose_pointer[t];
                if ( !isfinite(tempTHETA1) || !isfinite(tempTHETA2)) {
                    printf("\n\n********************************Big problems with update!***********************************");
                    printf("\nNaN observed when calculating tempTHETA1 in  recon3d !!!!!!!! \n t=%d, A_padd_Transpose=%f, WTransposeArray=%f, THETA1=%f; THETA2=%f, tempTHETA1=%f; tempTHETA2=%f\n", t, A_padd_Tranpose_pointer[t], WTransposeArrayPointer[t], THETA1[currentSlice],THETA2[currentSlice],tempTHETA1,tempTHETA2);
                    exit(1);
                }

            }
            THETA1[currentSlice]+=tempTHETA1;
            THETA2[currentSlice]+=tempTHETA2;
        }
        A_padd_Tranpose_pointer += myCount*pieceLength;
    }

    for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
    {
        THETA1[currentSlice]=-THETA1[currentSlice]*Aval_max*(1.0/255);
        THETA2[currentSlice]=THETA2[currentSlice]*Aval_max*(1.0/255)*Aval_max*(1.0/255);
    }

    ETransposeArrayPointer=&newEArrayTransposed[0][0];

    A_padd_Tranpose_pointer = &A_Padded_Map[SVPosition][theVoxelPosition].val[0];

    for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
    if(zero_skip_FLAG[currentSlice] == 0)
    {
        float pixel,step;
        if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_QGGMRF_3D)
        {
            step = QGGMRF3D_Update(reconparams,param_ext,tempV[currentSlice],&neighbors[currentSlice][0],THETA1[currentSlice],THETA2[currentSlice]);
        }
        else if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_PandP)
        {
            step = PandP_Update(reconparams,param_ext,tempV[currentSlice],tempProxMap[currentSlice],THETA1[currentSlice],THETA2[currentSlice]);
        }
        else
        {
            fprintf(stderr,"Error** Unrecognized ReconType in ICD update\n");
            exit(-1);
        }

        pixel = tempV[currentSlice] + step;  /* can apply over-relaxation to the step size here */

        if(PositivityFlag)
            image[(size_t)(startSlice+currentSlice)*Nxy + j_new*Nx+k_new] = ((pixel < 0.0) ? 0.0 : pixel);
        else
            image[(size_t)(startSlice+currentSlice)*Nxy + j_new*Nx+k_new] = pixel;

        diff[currentSlice] = image[(size_t)(startSlice+currentSlice)*Nxy + j_new*Nx+k_new] - tempV[currentSlice];

        totalChange_loc += fabs(diff[currentSlice]);
        totalValue_loc += fabs(tempV[currentSlice]);
        NumUpdates_loc++;

        diff[currentSlice]=diff[currentSlice]*Aval_max*(1.0/255);
    }

    for(p=0;p<NViewSets;p++)
    {
        const int myCount=A_Padded_Map[SVPosition][theVoxelPosition].pieceWiseWidth[p];
        const int pieceMin=A_Padded_Map[SVPosition][theVoxelPosition].pieceWiseMin[p];
        #pragma vector aligned
        for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
        if(diff[currentSlice]!=0 && zero_skip_FLAG[currentSlice] == 0)
        {
            ETransposeArrayPointer=&newEArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
            ETransposeArrayPointer+=pieceMin*pieceLength;

            #pragma vector aligned
            for(t=0;t<(myCount*pieceLength);t++)
                ETransposeArrayPointer[t]= ETransposeArrayPointer[t]-A_padd_Tranpose_pointer[t]*diff[currentSlice];
        }
        A_padd_Tranpose_pointer+=myCount*pieceLength;
    }
}

for (p = 0; p < NViewSets; p++)
    free((void *)newWArrayTransposed[p]);

free((void **)newWArrayTransposed);

headNodeArray[jj_new].x=totalChange_loc;

for (p = 0; p < NViewSets; p++)
for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
{
    ETransposeArrayPointer=&newEArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
    newEArrayPointer=&newEArray[p][currentSlice*bandWidth[p]*pieceLength];
    for(q=0;q<bandWidth[p];q++)
    {
        #pragma vector aligned
        for(t=0;t<pieceLength;t++)
            newEArrayPointer[bandWidth[p]*t+q]=ETransposeArrayPointer[q*pieceLength+t];
    }
}

for (p = 0; p < NViewSets; p++)
    free((void *)newEArrayTransposed[p]);

free((void **)newEArrayTransposed);

newEArrayPointer=&newEArray[0][0];
float* CopyNewEArrayPointer=&CopyNewEArray[0][0];
float* eArrayPointer=&sinoerr[0];

for (p = 0; p < NViewSets; p++)      /*XW: update the error term in the memory buffer*/
{
    newEArrayPointer=&newEArray[p][0];
    CopyNewEArrayPointer=&CopyNewEArray[p][0];
    for (currentSlice=0; currentSlice< SV_depth_modified;currentSlice++)
    {
        //#pragma vector aligned
        for(q=0;q<pieceLength;q++)
        {
            eArrayPointer=&sinoerr[(startSlice+currentSlice)*Nvc+p*pieceLength*sinoparams.NChannels+q*sinoparams.NChannels+bandMin[p*pieceLength+q]];
            for(t=0;t<bandWidth[p];t++)
            {
                #pragma omp atomic
                *eArrayPointer += (*newEArrayPointer)-(*CopyNewEArrayPointer);
                newEArrayPointer++;
                CopyNewEArrayPointer++;
                eArrayPointer++;
            }
        }
    }
}

for (p = 0; p < NViewSets; p++)
{
    free((void *)newEArray[p]);
    free((void *)CopyNewEArray[p]);
}
free((void **)newEArray);
free((void **)CopyNewEArray);

free((void *)k_newCoordinate);
free((void *)j_newCoordinate);
free((void *)bandMin);
free((void *)bandMax);
free((void *)bandWidth);
free((void *)bandWidthTemp);
free((void *)tempV);
free((void *)tempProxMap);
multifree(neighbors,2);
free((void *)zero_skip_FLAG);
free((void *)diff);
free((void *)THETA1);
free((void *)THETA2);

*NumUpdates += NumUpdates_loc;
*totalValue += totalValue_loc;
*totalChange += totalChange_loc;

} / END super_voxel_recon() /

void coordinateShuffle(int order1, int order2,int len) { int i, j, tmp1,tmp2;

for (i = 0; i < len-1; i++)
{
    j = i + (rand() % (len-i));
    tmp1 = order1[j];
    tmp2 = order2[j];
    order1[j] = order1[i];
    order2[j] = order2[i];
    order1[i] = tmp1;
    order2[i] = tmp2;
}

}

void three_way_shuffle(long order1, char order2, struct heap_node *headNodeArray, int len) { int i,j; long tmp_long; char tmp_char; float temp_x;

for (i = 0; i < len-1; i++)
{
    j = i + (rand() % (len-i));
    tmp_long = order1[j];
    order1[j] = order1[i];
    order1[i] = tmp_long;
    tmp_char = order2[j];
    order2[j] = order2[i];
    order2[i] = tmp_char;
    temp_x=headNodeArray[j].x;
    headNodeArray[j].x=headNodeArray[i].x;
    headNodeArray[i].x=temp_x;
}

}

float MAPCostFunction3D( float x, float e, float *w, struct ImageParams3D imgparams, struct SinoParams3DParallel sinoparams, struct ReconParams reconparams, struct ParamExt param_ext) { int i, M, j, jx, jy, jz, Nx, Ny, Nz, Nxy, plusx, minusx, plusy, plusz; float nloglike, nlogprior_nearest, nlogprior_diag, nlogprior_interslice, x0;

M = sinoparams.NViews * sinoparams.NChannels ;
Nx = imgparams.Nx;
Ny = imgparams.Ny;
Nz = imgparams.Nz;
Nxy = Nx*Ny;

nloglike = 0.0;
for (i = 0; i <sinoparams.NSlices; i++)
for (j = 0; j < M; j++)
    nloglike += e[i*M+j]*w[i*M+j]*e[i*M+j];

nloglike /= 2.0;
nlogprior_nearest = 0.0;
nlogprior_diag = 0.0;
nlogprior_interslice = 0.0;

for (jz = 0; jz < Nz; jz++)
for (jy = 0; jy < Ny; jy++)
for (jx = 0; jx < Nx; jx++)
{
    plusx = jx + 1;
    plusx = ((plusx < Nx) ? plusx : 0);
    minusx = jx - 1;
    minusx = ((minusx < 0) ? Nx-1 : minusx);
    plusy = jy + 1;
    plusy = ((plusy < Ny) ? plusy : 0);
    plusz = jz + 1;
    plusz = ((plusz < Nz) ? plusz : 0);

    j = jy*Nx + jx;
    x0 = x[jz*Nxy+j];

    nlogprior_nearest += QGGMRF_Potential((x0-x[jz*Nxy+jy*Nx+plusx]),reconparams,param_ext);
    nlogprior_nearest += QGGMRF_Potential((x0-x[jz*Nxy+plusy*Nx+jx]),reconparams,param_ext);
    nlogprior_diag += QGGMRF_Potential((x0-x[jz*Nxy+plusy*Nx+minusx]),reconparams,param_ext);
    nlogprior_diag += QGGMRF_Potential((x0-x[jz*Nxy+plusy*Nx+plusx]),reconparams,param_ext);
    nlogprior_interslice += QGGMRF_Potential((x0-x[plusz*Nxy+jy*Nx+jx]),reconparams,param_ext);
}

return (nloglike + reconparams.b_nearest * nlogprior_nearest + reconparams.b_diag * nlogprior_diag + reconparams.b_interslice * nlogprior_interslice) ;

}

/ Forward projection using input SV system matrix /

void SVproject( float proj, float image, struct AValues_char *A_Padded_Map, float Aval_max_ptr, struct ImageParams3D imgparams, struct SinoParams3DParallel sinoparams, struct SVParams svpar, char backproject_flag) { size_t i; int jz; int Nx = imgparams.Nx; int Ny = imgparams.Ny; int Nz = imgparams.Nz; int NChannels = sinoparams.NChannels; int Nvc = sinoparams.NViews sinoparams.NChannels; int SVLength = svpar.SVLength; int pieceLength = svpar.pieceLength; int SVsPerRow = svpar.SVsPerRow; int NViewSets = sinoparams.NViews/pieceLength; struct minStruct bandMinMap = svpar.bandMinMap;

/* initialize output */
if(backproject_flag)
    for (i = 0; i < (size_t)Nx*Ny*Nz; i++)
        image[i] = 0.0;
else
    for (i = 0; i < (size_t)Nvc*Nz; i++)
        proj[i] = 0.0;

#pragma omp parallel for schedule(dynamic)
for(jz=0;jz<Nz;jz++)
{
    int jx,jy,k,r,p;

    for (jy = 0; jy < Ny; jy++)
    for (jx = 0; jx < Nx; jx++)
    {
        int SV_ind_y = jy/(2*SVLength-svpar.overlap);
        int SV_ind_x = jx/(2*SVLength-svpar.overlap);
        int SVPosition = SV_ind_y*SVsPerRow + SV_ind_x;

        int SV_jy = SV_ind_y*(2*SVLength-svpar.overlap);
        int SV_jx = SV_ind_x*(2*SVLength-svpar.overlap);
        int VoxelPosition = (jy-SV_jy)*(2*SVLength+1)+(jx-SV_jx);

        // The second condition should always be true
        if (A_Padded_Map[SVPosition][VoxelPosition].length > 0 && VoxelPosition < ((2*SVLength+1)*(2*SVLength+1)))
        {
            unsigned char* A_padd_Tr_ptr = &A_Padded_Map[SVPosition][VoxelPosition].val[0];
            float rescale = Aval_max_ptr[jy*Nx+jx]*(1.0/255);
            size_t image_idx = (size_t)jz*Nx*Ny + jy*Nx + jx;
            float xval = image[image_idx];

            for(p=0;p<NViewSets;p++)
            {
                int myCount = A_Padded_Map[SVPosition][VoxelPosition].pieceWiseWidth[p];
                int pieceWiseMin = A_Padded_Map[SVPosition][VoxelPosition].pieceWiseMin[p];
                int position = p*pieceLength*NChannels + pieceWiseMin;

                for(r=0;r<myCount;r++)
                for(k=0;k<pieceLength;k++)
                {
                    channel_t bandMin = bandMinMap[SVPosition].bandMin[p*pieceLength+k];
                    size_t proj_idx = jz*Nvc + position + k*NChannels + bandMin + r;

                    if((pieceWiseMin + bandMin + r) >= NChannels || (position + k*NChannels + bandMin + r) >= Nvc ) {
                        fprintf(stderr,"SVproject() out of bounds: p %d r %d k %d\n",p,r,k);
                        fprintf(stderr,"SVproject() out of bounds: total_1 %d total_2 %d\n",pieceWiseMin+bandMin+r,position+k*NChannels+bandMin+r);
                        exit(-1);
                    }
                    else
                    {
                        if(backproject_flag)
                            image[image_idx] += A_padd_Tr_ptr[r*pieceLength+k]*rescale * proj[proj_idx];
                        else
                            proj[proj_idx] += A_padd_Tr_ptr[r*pieceLength+k]*rescale * xval;
                    }
                }
                A_padd_Tr_ptr += myCount*pieceLength;
            }
        }
    }
}

}

/ Forward projection wrapper that first reads or computes SV matrix /

void forwardProject( float proj, float image, struct ImageParams3D imgparams, struct SinoParams3DParallel sinoparams, char *Amatrix_fname, char backproject_flag, char verboseLevel) { int i,j; struct AValues_char *A_Padded_Map; float Aval_max_ptr; struct SVParams svpar;

/* print summary to stdout */
if(verboseLevel>1)
{
    fprintf(stdout,"forwardProject() -- build time: %s, %s\n", __DATE__, __TIME__);
    printSinoParams3DParallel(&sinoparams);
    printImageParams3D(&imgparams);
}

/* Initialize/allocate SV parameters */
initSVParams(&svpar, imgparams, sinoparams);
int Nsv = svpar.Nsv;
int SVLength = svpar.SVLength;

/* Read/compute/write System Matrix */
A_Padded_Map = (struct AValues_char **)multialloc(sizeof(struct AValues_char),2,Nsv,(2*SVLength+1)*(2*SVLength+1));
Aval_max_ptr = (float *) mget_spc(imgparams.Nx*imgparams.Ny,sizeof(float));
if(Amatrix_fname != NULL)
{
    if(verboseLevel)
        fprintf(stdout,"Reading system matrix...\n");
    readAmatrix(Amatrix_fname, A_Padded_Map, Aval_max_ptr, &imgparams, &sinoparams, svpar);
}
else
{
    if(verboseLevel)
        fprintf(stdout,"Computing system matrix...\n");
    char * ImageReconMask = GenImageReconMask(&imgparams);
    A_comp(A_Padded_Map,Aval_max_ptr,svpar,&sinoparams,ImageReconMask,&imgparams);
    free((void *)ImageReconMask);
}

/* Project */
if(verboseLevel)
{
    if(backproject_flag)
        fprintf(stdout,"Back-projecting sinogram...\n");
    else
        fprintf(stdout,"Projecting image...\n");
}
SVproject(proj,image,A_Padded_Map,Aval_max_ptr,imgparams,sinoparams,svpar,backproject_flag);

/* Free SV memory */
for(i=0;i<Nsv;i++) {
    free((void *)svpar.bandMinMap[i].bandMin);
    free((void *)svpar.bandMaxMap[i].bandMax);
}
free((void *)svpar.bandMinMap);
free((void *)svpar.bandMaxMap);

/* Free system matrix */
for(i=0;i<Nsv;i++)
for(j=0;j<(2*SVLength+1)*(2*SVLength+1);j++)
if(A_Padded_Map[i][j].length>0)
{
    free((void *)A_Padded_Map[i][j].val);
    free((void *)A_Padded_Map[i][j].pieceWiseMin);
    free((void *)A_Padded_Map[i][j].pieceWiseWidth);
}
multifree(A_Padded_Map,2);
free((void *)Aval_max_ptr);

} / END forwardProject() /

sjkisner commented 3 years ago

Oh by the way, notice this: unsigned char * A_padd_Tranpose_pointer

It's not a float!!, so when looking at the value use %d not %f.

dyang37 commented 3 years ago

Got it. Here's the updated recon3d script. The printed value is still NaN.


From: Sherman J Kisner @.> Sent: Monday, November 8, 2021 3:29 PM To: cabouman/svmbir @.> Cc: Diyu Yang @.>; Mention @.> Subject: Re: [cabouman/svmbir] Nans generated by recon (Issue #212)

Oh by the way, notice this: unsigned char * A_padd_Tranpose_pointer

It's not a float!!, so when looking at the value use %d not %f.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cabouman/svmbir/issues/212#issuecomment-963546321, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB2Q7OH6MK2TPILDLQCUG3LULAXI7ANCNFSM5HOWI5OQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

include

include

include

include

include

ifndef MSVC / not included in MS Visual C++ /

#include <sys/time.h>

endif

include "MBIRModularDefs.h"

include "MBIRModularUtils.h"

include "allocate.h"

include "icd3d.h"

include "heap.h"

include "A_comp.h"

include "initialize.h"

include "recon3d.h"

define TEST

//#define COMP_COST //#define COMP_RMSE

/ Internal functions / void super_voxel_recon(int jj,struct SVParams svpar,unsigned long NumUpdates,float totalValue,float totalChange,int iter, char phaseMap,long order,int indexList,float weight,float sinoerr, struct AValues_char *A_Padded_Map,float Aval_max_ptr,struct heap_node headNodeArray, struct SinoParams3DParallel sinoparams,struct ReconParams reconparams,struct ParamExt param_ext,float image, struct ImageParams3D imgparams, float proximalmap, float voxelsBuffer1,float voxelsBuffer2,char group_array,int group_id); void SVproject(float proj,float image,struct AValues_char *A_Padded_Map,float Aval_max_ptr, struct ImageParams3D imgparams,struct SinoParams3DParallel sinoparams,struct SVParams svpar,char backproject_flag); void coordinateShuffle(int order1, int order2,int len); void three_way_shuffle(long order1, char order2, struct heap_node headNodeArray,int len); float MAPCostFunction3D(float x,float e,float w,struct ImageParams3D imgparams,struct SinoParams3DParallel sinoparams, struct ReconParams reconparams,struct ParamExt param_ext);

void MBIRReconstruct( float image, float sino, float weight, float proj_init, float proximalmap, struct ImageParams3D imgparams, struct SinoParams3DParallel sinoparams, struct ReconParams reconparams, char Amatrix_fname, char verboseLevel) { float sinoerr, proximalmap_loc=NULL; int i,j,jj,p,t,iter,it_print=1; size_t k;

ifndef MSVC / not included in MS Visual C++ /

struct timeval tm1,tm2;
#endif

/* image/sino/recon parameters */
int Nx = imgparams.Nx;
int Ny = imgparams.Ny;
int Nz = imgparams.Nz;
int Nxy = Nx*Ny;
int Nvc = sinoparams.NViews * sinoparams.NChannels;
int MaxIterations = reconparams.MaxIterations;
float StopThreshold = reconparams.StopThreshold;

/* Initialize/allocate SV parameters */
struct AValues_char **A_Padded_Map;
float *Aval_max_ptr;
struct SVParams svpar;
initSVParams(&svpar, imgparams, sinoparams);
int Nsv = svpar.Nsv;
int SVLength = svpar.SVLength;
int SV_per_Z = svpar.SV_per_Z;
int SVsPerRow = svpar.SVsPerRow;

/* Activate proximal map mode if given as input */
if(proximalmap != NULL)
{
    reconparams.ReconType = MBIR_MODULAR_RECONTYPE_PandP;
    /* 'image' is reconstructed in place, so if proximal map is the same array, make a local copy */
    if(proximalmap == image)
    {
        proximalmap_loc = (float *) mget_spc((size_t)Nx*Ny*Nz,sizeof(float));
        for(k=0; k<(size_t)Nx*Ny*Nz; k++)
            proximalmap_loc[k] = proximalmap[k];
    }
    else
        proximalmap_loc = proximalmap;
}

/* print summary to stdout */
if(verboseLevel>1)
{
    fprintf(stdout,"MBIRReconstruct() -- build time: %s, %s\n", __DATE__, __TIME__);
    printSinoParams3DParallel(&sinoparams);
    printImageParams3D(&imgparams);
    if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_QGGMRF_3D)
        printReconParamsQGGMRF3D(&reconparams);
    if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_PandP)
        printReconParamsPandP(&reconparams);
}

/* Allocate and generate recon mask based on ROIRadius */
char * ImageReconMask = GenImageReconMask(&imgparams);

/* Read/compute/write System Matrix */
A_Padded_Map = (struct AValues_char **)multialloc(sizeof(struct AValues_char),2,Nsv,(2*SVLength+1)*(2*SVLength+1));
Aval_max_ptr = (float *) mget_spc(Nx*Ny,sizeof(float));
if(Amatrix_fname != NULL)
{
    if(verboseLevel)
        fprintf(stdout,"Reading system matrix...\n");
    readAmatrix(Amatrix_fname, A_Padded_Map, Aval_max_ptr, &imgparams, &sinoparams, svpar);
}
else
{
    if(verboseLevel)
        fprintf(stdout,"Computing system matrix...\n");
    A_comp(A_Padded_Map,Aval_max_ptr,svpar,&sinoparams,ImageReconMask,&imgparams);
}

/* Project image for sinogram error */
if(proj_init != NULL)
    sinoerr = proj_init;
else
{
    if(verboseLevel)
        fprintf(stdout,"Projecting image...\n");
    sinoerr = (float *) mget_spc((size_t)Nz*Nvc,sizeof(float));
    SVproject(sinoerr,image,A_Padded_Map,Aval_max_ptr,imgparams,sinoparams,svpar,0);
}
for(k=0; k<(size_t)Nz*Nvc; k++)
    sinoerr[k] = sino[k]-sinoerr[k];

/* Recon parameters */
NormalizePriorWeights3D(&reconparams);
struct ParamExt param_ext;
param_ext.pow_sigmaX_p = powf(reconparams.SigmaX,reconparams.p);
param_ext.pow_sigmaX_q = powf(reconparams.SigmaX,reconparams.q);
param_ext.pow_T_qmp    = powf(reconparams.T,reconparams.q - reconparams.p);
param_ext.SigmaXsq = reconparams.SigmaX * reconparams.SigmaX;

float *voxelsBuffer1;  /* the first N entries are the voxel values.  */
float *voxelsBuffer2;
unsigned long NumUpdates=0;
float totalValue=0,totalChange=0,equits=0;
float avg_update=0,avg_update_rel=0;
float c_ratio=0.07;
float convergence_rho=0.7;
int rep_num=(int)ceil(1/(4*c_ratio*convergence_rho));
struct heap priorityheap;
initialize_heap(&priorityheap);
long *order;
char *phaseMap, **group_id_list;

int NumMaskVoxels=0;
for(j=0;j<Nxy;j++)
if(ImageReconMask[j])
    NumMaskVoxels++;

#ifdef COMP_RMSE
    struct Image3D Image_ref;
    Image_ref.imgparams.Nx = imgparams.Nx;
    Image_ref.imgparams.Ny = imgparams.Ny;
    Image_ref.imgparams.Nz = imgparams.Nz;
    Image_ref.imgparams.FirstSliceNumber = imgparams.FirstSliceNumber;
    Image_ref.imgparams.NumSliceDigits = imgparams.NumSliceDigits;
    AllocateImageData3D(&Image_ref);
    ReadImage3D("ref/ref",&Image_ref);

    float ** image_ref = Image_ref.image;
    double rms_err=0,rms_val=0;
    int jz, Nz0=0, Nz1=Nz;
    for(jz=Nz0; jz<Nz1; jz++)
    for(j=0; j<Nxy; j++)
    if(ImageReconMask[j]) {
        rms_val += image_ref[jz][j]*image_ref[jz][j];
        rms_err += (image[(size_t)jz*Nxy+j]-image_ref[jz][j])*(image[(size_t)jz*Nxy+j]-image_ref[jz][j]);
    }
    rms_val = sqrt(rms_val/((float)NumMaskVoxels*(Nz1-Nz0)));
    rms_err = sqrt(rms_err/((float)NumMaskVoxels*(Nz1-Nz0)));
    FILE *fp_mse=fopen("rmse.txt","w");
    fprintf(fp_mse,"equits|rms_err|rms_val|rms_err/rms_val\n");
    fprintf(fp_mse,"%.2f %g %g %g\n",equits,rms_err,rms_val,rms_err/rms_val);
#endif

order = (long *) mget_spc(Nsv*SV_per_Z,sizeof(long));
phaseMap = (char *) mget_spc(Nsv*SV_per_Z,sizeof(char));
group_id_list = (char **) multialloc(sizeof(char),2,SV_per_Z,4);

/* Order of pixel updates need NOT be raster order, just initialize */
t=0;
for(p=0;p<Nz;p+=svpar.SVDepth)
for(i=0;i<Ny;i+=(SVLength*2-svpar.overlap))
for(j=0;j<Nx;j+=(SVLength*2-svpar.overlap))
{
    order[t]=(long)p*Nxy+i*Nx+j;  /* order is the first voxel coordinate, not the center */
    t++;
}

#pragma omp parallel for private(jj) schedule(dynamic)
for(i=0;i<SV_per_Z;i++)
for(jj=0;jj<Nsv;jj++)
{
    if((jj/SVsPerRow)%2==0)
    {
        if((jj%SVsPerRow)%2==0)
            phaseMap[i*Nsv+jj]=0;
        else
            phaseMap[i*Nsv+jj]=1;
    }
    else
    {
        if((jj%SVsPerRow)%2==0)
            phaseMap[i*Nsv+jj]=2;
        else
            phaseMap[i*Nsv+jj]=3;
    }
}

for(i=0;i<SV_per_Z;i++) {
    if(i%4==0){
        group_id_list[i][0]=0;
        group_id_list[i][1]=3;
        group_id_list[i][2]=1;
        group_id_list[i][3]=2;
    }
    else if(i%4==1) {
        group_id_list[i][0]=3;
        group_id_list[i][1]=0;
        group_id_list[i][2]=2;
        group_id_list[i][3]=1;
    }
    else if(i%4==2) {
        group_id_list[i][0]=1;
        group_id_list[i][1]=2;
        group_id_list[i][2]=3;
        group_id_list[i][3]=0;
    }
    else{
        group_id_list[i][0]=2;
        group_id_list[i][1]=1;
        group_id_list[i][2]=0;
        group_id_list[i][3]=3;
    }
}
#ifdef TEST
srand(0);
#else
srand(time(NULL));
#endif

struct heap_node *headNodeArray;
headNodeArray = (struct heap_node *) mget_spc(Nsv*SV_per_Z,sizeof(struct heap_node));

for(i=0;i<SV_per_Z;i++)
for(jj=0;jj<Nsv;jj++)
{
    headNodeArray[i*Nsv+jj].pt=i*Nsv+jj;
    headNodeArray[i*Nsv+jj].x=0.0;
}
int indexList_size=(int) Nsv*SV_per_Z*4*c_ratio*(1-convergence_rho);
int * indexList = (int *) mget_spc(indexList_size,sizeof(int));

#ifdef ICC
    voxelsBuffer1 = (float *)_mm_malloc(Nxy*sizeof(float),64);
    voxelsBuffer2 = (float *)_mm_malloc(Nxy*sizeof(float),64);
#else
    voxelsBuffer1 = (float *) malloc(Nxy*sizeof(float));
    voxelsBuffer2 = (float *) malloc(Nxy*sizeof(float));
    //voxelsBuffer1 = (float *) aligned_alloc(64,Nxy*sizeof(float));
    //voxelsBuffer2 = (float *) aligned_alloc(64,Nxy*sizeof(float));
    //voxelsBuffer1 = (float *) _aligned_malloc(Nxy*sizeof(float),64);
    //voxelsBuffer2 = (float *) _aligned_malloc(Nxy*sizeof(float),64);
#endif

for(i=0;i<Nxy;i++) {
    voxelsBuffer1[i]=0;
    voxelsBuffer2[i]=0;
}

//coordinateShuffle(&order[0],&phaseMap[0],Nsv*SV_per_Z);
long tmp_long;
char tmp_char;
for(i=0; i<Nsv*SV_per_Z-1; i++)
{
    j = i + (rand() % (Nsv*SV_per_Z-i));
    tmp_long = order[j];
    order[j] = order[i];
    order[i] = tmp_long;
    tmp_char = phaseMap[j];
    phaseMap[j] = phaseMap[i];
    phaseMap[i] = tmp_char;
}

iter=0;
char stop_FLAG=0;
int startIndex=0;
int endIndex=0;

if(verboseLevel) {
    fprintf(stdout,"Reconstructing...\n");
    #ifndef MSVC    /* not included in MS Visual C++ */
    gettimeofday(&tm1,NULL);
    #endif
}

#pragma omp parallel
{
    while(stop_FLAG==0 && equits<MaxIterations && iter<100*MaxIterations)
    {
        #pragma omp single
        {
            if(iter==0)
            {
                startIndex=0;
                endIndex=Nsv*SV_per_Z;
            }
            else
            {
                if((iter-1)%(2*rep_num)==0 && iter!=1)
                    three_way_shuffle(&order[0],&phaseMap[0],&headNodeArray[0],Nsv*SV_per_Z);

                if(iter%2==1)
                {
                    priorityheap.size=0;
                    for(jj=0;jj<Nsv*SV_per_Z;jj++){
                        heap_insert(&priorityheap, &(headNodeArray[jj]));
                    }
                    startIndex=0;
                    endIndex=indexList_size;

                    for(i=0;i<endIndex;i++) {
                        struct heap_node tempNode;
                        get_heap_max(&priorityheap, &tempNode);
                        indexList[i]=tempNode.pt;
                    }
                }
                else {
                    startIndex=((iter-2)/2)%rep_num*Nsv*SV_per_Z/rep_num;
                    endIndex=(((iter-2)/2)%rep_num+1)*Nsv*SV_per_Z/rep_num;
                }
            }
        }

        int group=0;

        for (group = 0; group < 4; group++)
        {
            #pragma omp for schedule(dynamic)  reduction(+:NumUpdates) reduction(+:totalValue) reduction(+:totalChange)
            for (jj = startIndex; jj < endIndex; jj+=1)
                super_voxel_recon(jj,svpar,&NumUpdates,&totalValue,&totalChange,iter,
                        &phaseMap[0],order,&indexList[0],weight,sinoerr,A_Padded_Map,&Aval_max_ptr[0],
                        &headNodeArray[0],sinoparams,reconparams,param_ext,image,imgparams,proximalmap_loc,
                        voxelsBuffer1,voxelsBuffer2,&group_id_list[0][0],group);
        }

        #pragma omp single
        {
            avg_update=avg_update_rel=0.0;
            if(NumUpdates>0) {
                avg_update = totalChange/NumUpdates;
                float avg_value = totalValue/NumUpdates;
                avg_update_rel = avg_update/avg_value * 100;
                //printf("avg_update %f, avg_value %f, avg_update_rel %f\n",avg_update,avg_value,avg_update_rel);
            }
            #ifdef COMP_COST
            float cost = MAPCostFunction3D(image,sinoerr,weight,imgparams,sinoparams,reconparams,param_ext);
            fprintf(stdout, "it %d cost = %-15f, avg_update %f \n", iter, cost, avg_update);
            #endif

            if (avg_update_rel < StopThreshold && (endIndex!=0))
                stop_FLAG = 1;

            iter++;
            equits += (float)NumUpdates/((float)NumMaskVoxels*Nz);

            if(verboseLevel && equits > it_print) {
                fprintf(stdout,"\titeration %d, average change %.4f %%\n",it_print,avg_update_rel);
                it_print++;
            }

            #ifdef COMP_RMSE
                rms_err=0;
                for(jz=Nz0; jz<Nz1; jz++)
                for(j=0; j<Nxy; j++)
                if(ImageReconMask[j])
                    rms_err += (image[(size_t)jz*Nxy+j]-image_ref[jz][j])*(image[(size_t)jz*Nxy+j]-image_ref[jz][j]);
                rms_err = sqrt(rms_err/((float)NumMaskVoxels*(Nz1-Nz0)));
                fprintf(fp_mse,"%.2f %g %g %g\n",equits,rms_err,rms_val,rms_err/rms_val);
            #endif

            NumUpdates=0;
            totalValue=0;
            totalChange=0;
        }
    }
}

if(verboseLevel)
{
    if(StopThreshold <= 0)
        fprintf(stdout,"\tNo stopping condition--running fixed iterations\n");
    else if(stop_FLAG == 1)
        fprintf(stdout,"\tReached stopping condition\n");
    else
        fprintf(stdout,"\tWarning: Didn't reach stopping condition\n");
    if(verboseLevel>1)
    {
        fprintf(stdout,"\tEquivalent iterations = %.1f, (non-homogeneous iterations = %d)\n",equits,iter);
        fprintf(stdout,"\tAverage update in last iteration (relative) = %f %%\n",avg_update_rel);
        fprintf(stdout,"\tAverage update in last iteration (magnitude) = %.4g\n",avg_update);
    }
    #ifndef MSVC    /* not included in MS Visual C++ */
    gettimeofday(&tm2,NULL);
    unsigned long long tt = 1000 * (tm2.tv_sec - tm1.tv_sec) + (tm2.tv_usec - tm1.tv_usec) / 1000;
    printf("\tReconstruction time = %llu ms (iterations only)\n", tt);
    #endif
}

/* If initial projection was supplied, update to return final projection */
if(proj_init != NULL)
{
    for(k=0; k<(size_t)Nz*Nvc; k++)
        proj_init[k] = sino[k]-sinoerr[k];
}
else
    free((void *)sinoerr);

/* If local copy of proximal map was made, free it */
if(proximalmap == image)
    free((void *)proximalmap_loc);

#ifdef ICC
    _mm_free((void *)voxelsBuffer1);
    _mm_free((void *)voxelsBuffer2);
#else
    free((void *)voxelsBuffer1);
    free((void *)voxelsBuffer2);
    //_aligned_free((void *)voxelsBuffer1);
    //_aligned_free((void *)voxelsBuffer2);
#endif

free((void *)headNodeArray);
free_heap((void *)&priorityheap);
free((void *)order);
free((void *)phaseMap);
multifree(group_id_list,2);
free((void *)indexList);

#ifdef COMP_RMSE
    FreeImageData3D(&Image_ref);
    fclose(fp_mse);
#endif

/* Free SV memory */
for(i=0;i<Nsv;i++) {
    free((void *)svpar.bandMinMap[i].bandMin);
    free((void *)svpar.bandMaxMap[i].bandMax);
}
free((void *)svpar.bandMinMap);
free((void *)svpar.bandMaxMap);

/* Free system matrix */
for(i=0;i<Nsv;i++)
for(j=0;j<(2*SVLength+1)*(2*SVLength+1);j++)
if(A_Padded_Map[i][j].length>0)
{
    free((void *)A_Padded_Map[i][j].val);
    free((void *)A_Padded_Map[i][j].pieceWiseMin);
    free((void *)A_Padded_Map[i][j].pieceWiseWidth);
}
multifree(A_Padded_Map,2);
free((void *)Aval_max_ptr);
free((void *)ImageReconMask);

} / END MBIRReconstruct() /

void super_voxel_recon( int jj, struct SVParams svpar, unsigned long NumUpdates, float totalValue, float totalChange, int iter, char phaseMap, long order, int indexList, float weight, float sinoerr, struct AValues_char * A_Padded_Map, float Aval_max_ptr, struct heap_node headNodeArray, struct SinoParams3DParallel sinoparams, struct ReconParams reconparams, struct ParamExt param_ext, float image, struct ImageParams3D imgparams, float proximalmap, float voxelsBuffer1, float voxelsBuffer2, char group_array, int group_id) { int jy,jx,p,i,q,t,j,currentSlice,startSlice; int SV_depth_modified; int NumUpdates_loc=0; float totalValue_loc=0,totalChange_loc=0;

int Nx = imgparams.Nx;
int Ny = imgparams.Ny;
int Nz = imgparams.Nz;
int Nxy = Nx*Ny;
int Nvc = sinoparams.NViews * sinoparams.NChannels;
char PositivityFlag = reconparams.Positivity;

int SVLength = svpar.SVLength;
int overlappingDistance = svpar.overlap;
int SV_depth = svpar.SVDepth;
int SVsPerRow = svpar.SVsPerRow;
struct minStruct * bandMinMap = svpar.bandMinMap;
struct maxStruct * bandMaxMap = svpar.bandMaxMap;
int pieceLength = svpar.pieceLength;
int NViewSets = sinoparams.NViews/pieceLength;

int jj_new;
if(iter%2==0)
    jj_new=jj;
else
    jj_new=indexList[jj];

startSlice = order[jj_new] / Nx / Ny;
jy = (order[jj_new] - startSlice* Nx * Ny) / Nx;
jx = (order[jj_new] - startSlice* Nx * Ny) % Nx;

if(phaseMap[jj_new]!=group_array[startSlice/SV_depth*4+group_id])
    return;

if((startSlice+SV_depth)>Nz)
    SV_depth_modified=Nz-startSlice;
else
    SV_depth_modified=SV_depth;

int SVPosition=jy/(2*SVLength-overlappingDistance)*SVsPerRow+jx/(2*SVLength-overlappingDistance);

int countNumber=0;  /*XW: the number of voxels chosen for a certain radius of circle*/
int radius =SVLength;   /*XW: choose the circle radius*/
int coordinateSize=1;   /*XW: CoordinateSize is the size of a minimum square enclosing the circle. For the baseline code, coordinateSize=1 because only 1 voxel*/
/*XW: imagine that this is a minimum square enclosing the circle. The square
 * "touches" the circle on 4 coordinate points. CoordianteSize is the possible
 * maximum number of voxels for a certain radius of circle */
if(radius!=0)
    coordinateSize=(2*radius+1)*(2*radius+1);
int * k_newCoordinate = (int *) mget_spc(coordinateSize,sizeof(int));
int * j_newCoordinate = (int *) mget_spc(coordinateSize,sizeof(int));
int j_newAA=0;
int k_newAA=0;
int voxelIncrement=0;

/*XW: choosing the voxels locations in a circle*/
for(j_newAA=jy;j_newAA<=(jy+2*radius);j_newAA++)
for(k_newAA=jx;k_newAA<=(jx+2*radius);k_newAA++)
{
    if(j_newAA>=0 && k_newAA >=0 && j_newAA <Ny && k_newAA < Nx)
    {
        if(A_Padded_Map[SVPosition][voxelIncrement].length >0) {
            j_newCoordinate[countNumber]=j_newAA;
            k_newCoordinate[countNumber]=k_newAA;
            countNumber++;
        }
    }
    voxelIncrement++;
}

/*XW: if no voxel chosen, we skip this loop iteration*/
if(countNumber==0)
{
    free((void *)k_newCoordinate);
    free((void *)j_newCoordinate);
    return;
}

coordinateShuffle(&j_newCoordinate[0],&k_newCoordinate[0],countNumber);

/*XW: for a supervoxel, bandMin records the starting position of the sinogram band at each view*/
/*XW: for a supervoxel, bandMax records the end position of the sinogram band at each view */
channel_t * bandMin = (channel_t *) mget_spc(sinoparams.NViews,sizeof(channel_t));
channel_t * bandMax = (channel_t *) mget_spc(sinoparams.NViews,sizeof(channel_t));
channel_t * bandWidthTemp = (channel_t *) mget_spc(sinoparams.NViews,sizeof(channel_t));
channel_t * bandWidth = (channel_t *) mget_spc(NViewSets,sizeof(channel_t));

#ifdef ICC
_intel_fast_memcpy(&bandMin[0],&bandMinMap[SVPosition].bandMin[0],sizeof(channel_t)*(sinoparams.NViews));
_intel_fast_memcpy(&bandMax[0],&bandMaxMap[SVPosition].bandMax[0],sizeof(channel_t)*(sinoparams.NViews));
#else
memcpy(&bandMin[0],&bandMinMap[SVPosition].bandMin[0],sizeof(channel_t)*(sinoparams.NViews));
memcpy(&bandMax[0],&bandMaxMap[SVPosition].bandMax[0],sizeof(channel_t)*(sinoparams.NViews));
#endif

//#pragma vector aligned
for(p=0;p< sinoparams.NViews;p++)
    bandWidthTemp[p]=bandMax[p]-bandMin[p];

for (p = 0; p < NViewSets; p++)
{
    int bandWidthMax=bandWidthTemp[p*pieceLength];
    for(t=0;t<pieceLength;t++){
        if(bandWidthTemp[p*pieceLength+t]>bandWidthMax)
            bandWidthMax=bandWidthTemp[p*pieceLength+t];
    }
    bandWidth[p]=bandWidthMax;
}

float ** newWArray = (float **)malloc(sizeof(float *) * NViewSets);
float ** newEArray = (float **)malloc(sizeof(float *) * NViewSets);
float ** CopyNewEArray = (float **)malloc(sizeof(float *) * NViewSets);

for (p = 0; p < NViewSets; p++) {
    newWArray[p] = (float *)malloc(sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
    newEArray[p] = (float *)malloc(sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
    CopyNewEArray[p] = (float *)malloc(sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
}

float *newWArrayPointer=&newWArray[0][0];
float *newEArrayPointer=&newEArray[0][0];

/*XW: copy the interlaced we into the memory buffer*/
for (p = 0; p < NViewSets; p++)
{
    newWArrayPointer=&newWArray[p][0];
    newEArrayPointer=&newEArray[p][0];
    for(i=0;i<SV_depth_modified;i++)
    for(q=0;q<pieceLength;q++)
    {
        #ifdef ICC
        _intel_fast_memcpy(newWArrayPointer,&weight[(startSlice+i)*Nvc+p*pieceLength*sinoparams.NChannels+q*sinoparams.NChannels+bandMin[p*pieceLength+q]],sizeof(float)*(bandWidth[p]));
        _intel_fast_memcpy(newEArrayPointer,&sinoerr[(startSlice+i)*Nvc+p*pieceLength*sinoparams.NChannels+q*sinoparams.NChannels+bandMin[p*pieceLength+q]],sizeof(float)*(bandWidth[p]));
        #else
        memcpy(newWArrayPointer,&weight[(startSlice+i)*Nvc+p*pieceLength*sinoparams.NChannels+q*sinoparams.NChannels+bandMin[p*pieceLength+q]],sizeof(float)*(bandWidth[p]));
        memcpy(newEArrayPointer,&sinoerr[(startSlice+i)*Nvc+p*pieceLength*sinoparams.NChannels+q*sinoparams.NChannels+bandMin[p*pieceLength+q]],sizeof(float)*(bandWidth[p]));
        #endif
        newWArrayPointer+=bandWidth[p];
        newEArrayPointer+=bandWidth[p];
    }
}

for (p = 0; p < NViewSets; p++)
{
    #ifdef ICC
    _intel_fast_memcpy(&CopyNewEArray[p][0],&newEArray[p][0],sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
    #else
    memcpy(&CopyNewEArray[p][0],&newEArray[p][0],sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
    #endif
}

float ** newWArrayTransposed = (float **)malloc(sizeof(float *) * NViewSets);
float ** newEArrayTransposed = (float **)malloc(sizeof(float *) * NViewSets);

for (p = 0; p < NViewSets; p++)
{
    newWArrayTransposed[p] = (float *)malloc(sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
    newEArrayTransposed[p] = (float *)malloc(sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
}

float *WTransposeArrayPointer=&newWArrayTransposed[0][0];
float *ETransposeArrayPointer=&newEArrayTransposed[0][0];

for (p = 0; p < NViewSets; p++)
for(currentSlice=0;currentSlice<(SV_depth_modified);currentSlice++) 
{
    WTransposeArrayPointer=&newWArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
    ETransposeArrayPointer=&newEArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
    newEArrayPointer=&newEArray[p][currentSlice*bandWidth[p]*pieceLength];
    newWArrayPointer=&newWArray[p][currentSlice*bandWidth[p]*pieceLength];
    for(q=0;q<bandWidth[p];q++)
    {
        #pragma vector aligned
        for(t=0;t<pieceLength;t++)
        {
            ETransposeArrayPointer[q*pieceLength+t]=newEArrayPointer[bandWidth[p]*t+q];
            WTransposeArrayPointer[q*pieceLength+t]=newWArrayPointer[bandWidth[p]*t+q];
        }
    }
}

WTransposeArrayPointer=&newWArrayTransposed[0][0];
ETransposeArrayPointer=&newEArrayTransposed[0][0];
newEArrayPointer=&newEArray[0][0];

for (p = 0; p < NViewSets; p++)
    free((void *)newWArray[p]);

free((void **)newWArray);

/* Turn off zero-skipping for 1st iteration */
char zero_skip_enable=0;  // 1: enable, 0: disable
if(iter>0 && PositivityFlag)
    zero_skip_enable=1;

/*XW: the start of the loop to compute theta1, theta2*/
float * tempV = (float *) mget_spc(SV_depth_modified,sizeof(float));
float * tempProxMap = (float *) mget_spc(SV_depth_modified,sizeof(float));
float ** neighbors = (float **) multialloc(sizeof(float),2,SV_depth_modified,10);
char * zero_skip_FLAG = (char *) mget_spc(SV_depth_modified,sizeof(char));
float * diff = (float *) mget_spc(SV_depth_modified,sizeof(float));
float * THETA1 = (float *) get_spc(SV_depth_modified,sizeof(float));
float * THETA2 = (float *) get_spc(SV_depth_modified,sizeof(float));

for(i=0;i<countNumber;i++)
{
    const short j_new = j_newCoordinate[i];   /*XW: get the voxel's x,y location*/
    const short k_new = k_newCoordinate[i];
    float Aval_max = Aval_max_ptr[j_new*Nx+k_new];

    for(p=0;p<SV_depth_modified;p++)
        THETA1[p]=THETA2[p]=0.0;

    int theVoxelPosition=(j_new-jy)*(2*SVLength+1)+(k_new-jx);
    unsigned char * A_padd_Tranpose_pointer = &A_Padded_Map[SVPosition][theVoxelPosition].val[0];

    for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
    {
        tempV[currentSlice] = (float)(image[(size_t)(startSlice+currentSlice)*Nxy + j_new*Nx+k_new]); /* current voxel value */

        zero_skip_FLAG[currentSlice] = 0;

        if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_QGGMRF_3D)
        {
            ExtractNeighbors3D(&neighbors[currentSlice][0],k_new,j_new,&image[(size_t)(startSlice+currentSlice)*Nxy],imgparams);

            if((startSlice+currentSlice)==0)
                neighbors[currentSlice][8]=voxelsBuffer1[j_new*Nx+k_new];
            else
                neighbors[currentSlice][8]=image[(size_t)(startSlice+currentSlice-1)*Nxy + j_new*Nx+k_new];

            if((startSlice+currentSlice)<(Nz-1))
                neighbors[currentSlice][9]=image[(size_t)(startSlice+currentSlice+1)*Nxy + j_new*Nx+k_new];
            else
                neighbors[currentSlice][9]=voxelsBuffer2[j_new*Nx+k_new];

            if(zero_skip_enable)
            if(tempV[currentSlice] == 0.0)
            {
                zero_skip_FLAG[currentSlice] = 1;
                for (j = 0; j < 10; j++)
                {
                    if (neighbors[currentSlice][j] != 0.0)
                    {
                        zero_skip_FLAG[currentSlice] = 0;
                        break;
                    }
                }
            }
        }
        if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_PandP)
            tempProxMap[currentSlice] = proximalmap[(startSlice+currentSlice)*Nxy + j_new*Nx+k_new];
    }

    A_padd_Tranpose_pointer = &A_Padded_Map[SVPosition][theVoxelPosition].val[0];
    for(p=0;p<NViewSets;p++)
    {
        const int myCount=A_Padded_Map[SVPosition][theVoxelPosition].pieceWiseWidth[p];
        const int pieceMin=A_Padded_Map[SVPosition][theVoxelPosition].pieceWiseMin[p];
        #pragma vector aligned
        for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
        if(zero_skip_FLAG[currentSlice] == 0)
        {
            WTransposeArrayPointer=&newWArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
            ETransposeArrayPointer=&newEArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
            WTransposeArrayPointer+=pieceMin*pieceLength;
            ETransposeArrayPointer+=pieceMin*pieceLength;
            float tempTHETA1=0.0;
            float tempTHETA2=0.0;
            //Not finding evidence this makes a difference --SJK
            //Deprecated by Intel anyway
            //#pragma vector aligned
            //#pragma simd reduction(+:tempTHETA2,tempTHETA1)
            for(t=0;t<myCount*pieceLength;t++)
            {   /* summing over voxels which are not skipped or masked*/
                tempTHETA1 += A_padd_Tranpose_pointer[t]*WTransposeArrayPointer[t]*ETransposeArrayPointer[t];
                tempTHETA2 += A_padd_Tranpose_pointer[t]*WTransposeArrayPointer[t]*A_padd_Tranpose_pointer[t];
                if ( !isfinite(tempTHETA1) || !isfinite(tempTHETA2)) {
                    printf("\n\n********************************Big problems with update!***********************************");
                    printf("\n NaN observed when calculating tempTHETA1 in  recon3d !!!!!!!!"
                           "\n t=%d, A_padd_Transpose=%d, WTransposeArray=%f," 
                           "\n THETA1=%f; THETA2=%f, tempTHETA1=%f; tempTHETA2=%f\n", 
                            t, A_padd_Tranpose_pointer[t], WTransposeArrayPointer[t], 
                            THETA1[currentSlice],THETA2[currentSlice],tempTHETA1,tempTHETA2);
                    exit(1);
                }

            }
            THETA1[currentSlice]+=tempTHETA1;
            THETA2[currentSlice]+=tempTHETA2;
        }
        A_padd_Tranpose_pointer += myCount*pieceLength;
    }

    for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
    {
        THETA1[currentSlice]=-THETA1[currentSlice]*Aval_max*(1.0/255);
        THETA2[currentSlice]=THETA2[currentSlice]*Aval_max*(1.0/255)*Aval_max*(1.0/255);
    }

    ETransposeArrayPointer=&newEArrayTransposed[0][0];

    A_padd_Tranpose_pointer = &A_Padded_Map[SVPosition][theVoxelPosition].val[0];

    for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
    if(zero_skip_FLAG[currentSlice] == 0)
    {
        float pixel,step;
        if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_QGGMRF_3D)
        {
            step = QGGMRF3D_Update(reconparams,param_ext,tempV[currentSlice],&neighbors[currentSlice][0],THETA1[currentSlice],THETA2[currentSlice]);
        }
        else if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_PandP)
        {
            step = PandP_Update(reconparams,param_ext,tempV[currentSlice],tempProxMap[currentSlice],THETA1[currentSlice],THETA2[currentSlice]);
        }
        else
        {
            fprintf(stderr,"Error** Unrecognized ReconType in ICD update\n");
            exit(-1);
        }

        pixel = tempV[currentSlice] + step;  /* can apply over-relaxation to the step size here */

        if(PositivityFlag)
            image[(size_t)(startSlice+currentSlice)*Nxy + j_new*Nx+k_new] = ((pixel < 0.0) ? 0.0 : pixel);
        else
            image[(size_t)(startSlice+currentSlice)*Nxy + j_new*Nx+k_new] = pixel;

        diff[currentSlice] = image[(size_t)(startSlice+currentSlice)*Nxy + j_new*Nx+k_new] - tempV[currentSlice];

        totalChange_loc += fabs(diff[currentSlice]);
        totalValue_loc += fabs(tempV[currentSlice]);
        NumUpdates_loc++;

        diff[currentSlice]=diff[currentSlice]*Aval_max*(1.0/255);
    }

    for(p=0;p<NViewSets;p++)
    {
        const int myCount=A_Padded_Map[SVPosition][theVoxelPosition].pieceWiseWidth[p];
        const int pieceMin=A_Padded_Map[SVPosition][theVoxelPosition].pieceWiseMin[p];
        #pragma vector aligned
        for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
        if(diff[currentSlice]!=0 && zero_skip_FLAG[currentSlice] == 0)
        {
            ETransposeArrayPointer=&newEArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
            ETransposeArrayPointer+=pieceMin*pieceLength;

            #pragma vector aligned
            for(t=0;t<(myCount*pieceLength);t++)
                ETransposeArrayPointer[t]= ETransposeArrayPointer[t]-A_padd_Tranpose_pointer[t]*diff[currentSlice];
        }
        A_padd_Tranpose_pointer+=myCount*pieceLength;
    }
}

for (p = 0; p < NViewSets; p++)
    free((void *)newWArrayTransposed[p]);

free((void **)newWArrayTransposed);

headNodeArray[jj_new].x=totalChange_loc;

for (p = 0; p < NViewSets; p++)
for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
{
    ETransposeArrayPointer=&newEArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
    newEArrayPointer=&newEArray[p][currentSlice*bandWidth[p]*pieceLength];
    for(q=0;q<bandWidth[p];q++)
    {
        #pragma vector aligned
        for(t=0;t<pieceLength;t++)
            newEArrayPointer[bandWidth[p]*t+q]=ETransposeArrayPointer[q*pieceLength+t];
    }
}

for (p = 0; p < NViewSets; p++)
    free((void *)newEArrayTransposed[p]);

free((void **)newEArrayTransposed);

newEArrayPointer=&newEArray[0][0];
float* CopyNewEArrayPointer=&CopyNewEArray[0][0];
float* eArrayPointer=&sinoerr[0];

for (p = 0; p < NViewSets; p++)      /*XW: update the error term in the memory buffer*/
{
    newEArrayPointer=&newEArray[p][0];
    CopyNewEArrayPointer=&CopyNewEArray[p][0];
    for (currentSlice=0; currentSlice< SV_depth_modified;currentSlice++)
    {
        //#pragma vector aligned
        for(q=0;q<pieceLength;q++)
        {
            eArrayPointer=&sinoerr[(startSlice+currentSlice)*Nvc+p*pieceLength*sinoparams.NChannels+q*sinoparams.NChannels+bandMin[p*pieceLength+q]];
            for(t=0;t<bandWidth[p];t++)
            {
                #pragma omp atomic
                *eArrayPointer += (*newEArrayPointer)-(*CopyNewEArrayPointer);
                newEArrayPointer++;
                CopyNewEArrayPointer++;
                eArrayPointer++;
            }
        }
    }
}

for (p = 0; p < NViewSets; p++)
{
    free((void *)newEArray[p]);
    free((void *)CopyNewEArray[p]);
}
free((void **)newEArray);
free((void **)CopyNewEArray);

free((void *)k_newCoordinate);
free((void *)j_newCoordinate);
free((void *)bandMin);
free((void *)bandMax);
free((void *)bandWidth);
free((void *)bandWidthTemp);
free((void *)tempV);
free((void *)tempProxMap);
multifree(neighbors,2);
free((void *)zero_skip_FLAG);
free((void *)diff);
free((void *)THETA1);
free((void *)THETA2);

*NumUpdates += NumUpdates_loc;
*totalValue += totalValue_loc;
*totalChange += totalChange_loc;

} / END super_voxel_recon() /

void coordinateShuffle(int order1, int order2,int len) { int i, j, tmp1,tmp2;

for (i = 0; i < len-1; i++)
{
    j = i + (rand() % (len-i));
    tmp1 = order1[j];
    tmp2 = order2[j];
    order1[j] = order1[i];
    order2[j] = order2[i];
    order1[i] = tmp1;
    order2[i] = tmp2;
}

}

void three_way_shuffle(long order1, char order2, struct heap_node *headNodeArray, int len) { int i,j; long tmp_long; char tmp_char; float temp_x;

for (i = 0; i < len-1; i++)
{
    j = i + (rand() % (len-i));
    tmp_long = order1[j];
    order1[j] = order1[i];
    order1[i] = tmp_long;
    tmp_char = order2[j];
    order2[j] = order2[i];
    order2[i] = tmp_char;
    temp_x=headNodeArray[j].x;
    headNodeArray[j].x=headNodeArray[i].x;
    headNodeArray[i].x=temp_x;
}

}

float MAPCostFunction3D( float x, float e, float *w, struct ImageParams3D imgparams, struct SinoParams3DParallel sinoparams, struct ReconParams reconparams, struct ParamExt param_ext) { int i, M, j, jx, jy, jz, Nx, Ny, Nz, Nxy, plusx, minusx, plusy, plusz; float nloglike, nlogprior_nearest, nlogprior_diag, nlogprior_interslice, x0;

M = sinoparams.NViews * sinoparams.NChannels ;
Nx = imgparams.Nx;
Ny = imgparams.Ny;
Nz = imgparams.Nz;
Nxy = Nx*Ny;

nloglike = 0.0;
for (i = 0; i <sinoparams.NSlices; i++)
for (j = 0; j < M; j++)
    nloglike += e[i*M+j]*w[i*M+j]*e[i*M+j];

nloglike /= 2.0;
nlogprior_nearest = 0.0;
nlogprior_diag = 0.0;
nlogprior_interslice = 0.0;

for (jz = 0; jz < Nz; jz++)
for (jy = 0; jy < Ny; jy++)
for (jx = 0; jx < Nx; jx++)
{
    plusx = jx + 1;
    plusx = ((plusx < Nx) ? plusx : 0);
    minusx = jx - 1;
    minusx = ((minusx < 0) ? Nx-1 : minusx);
    plusy = jy + 1;
    plusy = ((plusy < Ny) ? plusy : 0);
    plusz = jz + 1;
    plusz = ((plusz < Nz) ? plusz : 0);

    j = jy*Nx + jx;
    x0 = x[jz*Nxy+j];

    nlogprior_nearest += QGGMRF_Potential((x0-x[jz*Nxy+jy*Nx+plusx]),reconparams,param_ext);
    nlogprior_nearest += QGGMRF_Potential((x0-x[jz*Nxy+plusy*Nx+jx]),reconparams,param_ext);
    nlogprior_diag += QGGMRF_Potential((x0-x[jz*Nxy+plusy*Nx+minusx]),reconparams,param_ext);
    nlogprior_diag += QGGMRF_Potential((x0-x[jz*Nxy+plusy*Nx+plusx]),reconparams,param_ext);
    nlogprior_interslice += QGGMRF_Potential((x0-x[plusz*Nxy+jy*Nx+jx]),reconparams,param_ext);
}

return (nloglike + reconparams.b_nearest * nlogprior_nearest + reconparams.b_diag * nlogprior_diag + reconparams.b_interslice * nlogprior_interslice) ;

}

/ Forward projection using input SV system matrix /

void SVproject( float proj, float image, struct AValues_char *A_Padded_Map, float Aval_max_ptr, struct ImageParams3D imgparams, struct SinoParams3DParallel sinoparams, struct SVParams svpar, char backproject_flag) { size_t i; int jz; int Nx = imgparams.Nx; int Ny = imgparams.Ny; int Nz = imgparams.Nz; int NChannels = sinoparams.NChannels; int Nvc = sinoparams.NViews sinoparams.NChannels; int SVLength = svpar.SVLength; int pieceLength = svpar.pieceLength; int SVsPerRow = svpar.SVsPerRow; int NViewSets = sinoparams.NViews/pieceLength; struct minStruct bandMinMap = svpar.bandMinMap;

/* initialize output */
if(backproject_flag)
    for (i = 0; i < (size_t)Nx*Ny*Nz; i++)
        image[i] = 0.0;
else
    for (i = 0; i < (size_t)Nvc*Nz; i++)
        proj[i] = 0.0;

#pragma omp parallel for schedule(dynamic)
for(jz=0;jz<Nz;jz++)
{
    int jx,jy,k,r,p;

    for (jy = 0; jy < Ny; jy++)
    for (jx = 0; jx < Nx; jx++)
    {
        int SV_ind_y = jy/(2*SVLength-svpar.overlap);
        int SV_ind_x = jx/(2*SVLength-svpar.overlap);
        int SVPosition = SV_ind_y*SVsPerRow + SV_ind_x;

        int SV_jy = SV_ind_y*(2*SVLength-svpar.overlap);
        int SV_jx = SV_ind_x*(2*SVLength-svpar.overlap);
        int VoxelPosition = (jy-SV_jy)*(2*SVLength+1)+(jx-SV_jx);

        // The second condition should always be true
        if (A_Padded_Map[SVPosition][VoxelPosition].length > 0 && VoxelPosition < ((2*SVLength+1)*(2*SVLength+1)))
        {
            unsigned char* A_padd_Tr_ptr = &A_Padded_Map[SVPosition][VoxelPosition].val[0];
            float rescale = Aval_max_ptr[jy*Nx+jx]*(1.0/255);
            size_t image_idx = (size_t)jz*Nx*Ny + jy*Nx + jx;
            float xval = image[image_idx];

            for(p=0;p<NViewSets;p++)
            {
                int myCount = A_Padded_Map[SVPosition][VoxelPosition].pieceWiseWidth[p];
                int pieceWiseMin = A_Padded_Map[SVPosition][VoxelPosition].pieceWiseMin[p];
                int position = p*pieceLength*NChannels + pieceWiseMin;

                for(r=0;r<myCount;r++)
                for(k=0;k<pieceLength;k++)
                {
                    channel_t bandMin = bandMinMap[SVPosition].bandMin[p*pieceLength+k];
                    size_t proj_idx = jz*Nvc + position + k*NChannels + bandMin + r;

                    if((pieceWiseMin + bandMin + r) >= NChannels || (position + k*NChannels + bandMin + r) >= Nvc ) {
                        fprintf(stderr,"SVproject() out of bounds: p %d r %d k %d\n",p,r,k);
                        fprintf(stderr,"SVproject() out of bounds: total_1 %d total_2 %d\n",pieceWiseMin+bandMin+r,position+k*NChannels+bandMin+r);
                        exit(-1);
                    }
                    else
                    {
                        if(backproject_flag)
                            image[image_idx] += A_padd_Tr_ptr[r*pieceLength+k]*rescale * proj[proj_idx];
                        else
                            proj[proj_idx] += A_padd_Tr_ptr[r*pieceLength+k]*rescale * xval;
                    }
                }
                A_padd_Tr_ptr += myCount*pieceLength;
            }
        }
    }
}

}

/ Forward projection wrapper that first reads or computes SV matrix /

void forwardProject( float proj, float image, struct ImageParams3D imgparams, struct SinoParams3DParallel sinoparams, char *Amatrix_fname, char backproject_flag, char verboseLevel) { int i,j; struct AValues_char *A_Padded_Map; float Aval_max_ptr; struct SVParams svpar;

/* print summary to stdout */
if(verboseLevel>1)
{
    fprintf(stdout,"forwardProject() -- build time: %s, %s\n", __DATE__, __TIME__);
    printSinoParams3DParallel(&sinoparams);
    printImageParams3D(&imgparams);
}

/* Initialize/allocate SV parameters */
initSVParams(&svpar, imgparams, sinoparams);
int Nsv = svpar.Nsv;
int SVLength = svpar.SVLength;

/* Read/compute/write System Matrix */
A_Padded_Map = (struct AValues_char **)multialloc(sizeof(struct AValues_char),2,Nsv,(2*SVLength+1)*(2*SVLength+1));
Aval_max_ptr = (float *) mget_spc(imgparams.Nx*imgparams.Ny,sizeof(float));
if(Amatrix_fname != NULL)
{
    if(verboseLevel)
        fprintf(stdout,"Reading system matrix...\n");
    readAmatrix(Amatrix_fname, A_Padded_Map, Aval_max_ptr, &imgparams, &sinoparams, svpar);
}
else
{
    if(verboseLevel)
        fprintf(stdout,"Computing system matrix...\n");
    char * ImageReconMask = GenImageReconMask(&imgparams);
    A_comp(A_Padded_Map,Aval_max_ptr,svpar,&sinoparams,ImageReconMask,&imgparams);
    free((void *)ImageReconMask);
}

/* Project */
if(verboseLevel)
{
    if(backproject_flag)
        fprintf(stdout,"Back-projecting sinogram...\n");
    else
        fprintf(stdout,"Projecting image...\n");
}
SVproject(proj,image,A_Padded_Map,Aval_max_ptr,imgparams,sinoparams,svpar,backproject_flag);

/* Free SV memory */
for(i=0;i<Nsv;i++) {
    free((void *)svpar.bandMinMap[i].bandMin);
    free((void *)svpar.bandMaxMap[i].bandMax);
}
free((void *)svpar.bandMinMap);
free((void *)svpar.bandMaxMap);

/* Free system matrix */
for(i=0;i<Nsv;i++)
for(j=0;j<(2*SVLength+1)*(2*SVLength+1);j++)
if(A_Padded_Map[i][j].length>0)
{
    free((void *)A_Padded_Map[i][j].val);
    free((void *)A_Padded_Map[i][j].pieceWiseMin);
    free((void *)A_Padded_Map[i][j].pieceWiseWidth);
}
multifree(A_Padded_Map,2);
free((void *)Aval_max_ptr);

} / END forwardProject() /

dyang37 commented 3 years ago

Also one more important thing: Looks like NaN values of THETA occur in both the case of using zero weights and the case of using transmission weights, but for different reasons.

In the demo script that I attached I used weights=0, and looks like the NaN values of THETA1 and THETA2 are resulted from NaN values of A matrix entry, as you saw. In the script Charlie sent out last Friday, where transmission weights are used instead of 0 weights, NaN values still occur. In this case looks like matrix values are fine, and NaN occurs because one of the surrogate coeffs is inf in icd3d.c:

Reading system matrix...
Projecting image...
Reconstructing...

********************************Big problems with update!***********************************
 NaN observed with update step inside icd3d! change=-nan; THETA1old=-0.000000; THETA2old=0.000000; THETA1=-inf; THETA2=inf

Surrogate coeff at 0 = 1054.937256

Surrogate coeff at 1 = 1054.937256

Surrogate coeff at 2 = 1054.937256

Surrogate coeff at 3 = 1054.937256

Surrogate coeff at 4 = inf

Surrogate coeff at 5 = 1054.937256

Surrogate coeff at 6 = 1054.937378

Surrogate coeff at 7 = 1054.937256

Surrogate coeff at 8 = 1054.937256

Surrogate coeff at 9 = 1054.937256

I attached the modified scripts of recon3d.c and icd3d.c, as well as both python scripts. You should be able to reproduce what I saw just by copying the c code files into the right place, re-compile, and run two python scripts respectively.


From: Diyu Yang @.> Sent: Monday, November 8, 2021 3:33 PM To: cabouman/svmbir @.>; cabouman/svmbir @.> Cc: Mention @.> Subject: Re: [cabouman/svmbir] Nans generated by recon (Issue #212)

Got it. Here's the updated recon3d script. The printed value is still NaN.


From: Sherman J Kisner @.> Sent: Monday, November 8, 2021 3:29 PM To: cabouman/svmbir @.> Cc: Diyu Yang @.>; Mention @.> Subject: Re: [cabouman/svmbir] Nans generated by recon (Issue #212)

Oh by the way, notice this: unsigned char * A_padd_Tranpose_pointer

It's not a float!!, so when looking at the value use %d not %f.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cabouman/svmbir/issues/212#issuecomment-963546321, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB2Q7OH6MK2TPILDLQCUG3LULAXI7ANCNFSM5HOWI5OQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

include

include

include

include "MBIRModularDefs.h"

include "MBIRModularUtils.h"

include "icd3d.h"

/ Plug & Play update w/ proximal map prior / float PandP_Update(struct ReconParams reconparams,struct ParamExt param_ext,float tempV,float tempProxMap,float THETA1,float THETA2) { float SigmaXsq = param_ext.SigmaXsq; return(-(SigmaXsqTHETA1 + tempV - tempProxMap) / (SigmaXsqTHETA2 + 1.0)); }

/ ICD update with the QGGMRF prior model / / Prior and neighborhood specific / float QGGMRF3D_Update(struct ReconParams reconparams,struct ParamExt param_ext,float tempV, float neighbors,float THETA1,float THETA2) { int j; / Neighbor relative position to Pixel being updated / float sum1_Nearest=0, sum1_Diag=0, sum1_Interslice=0; / for theta1 calculation / float sum2_Nearest=0, sum2_Diag=0, sum2_Interslice=0; / for theta2 calculation */ float b_nearest, b_diag, b_interslice;

b_nearest=reconparams.b_nearest;
b_diag=reconparams.b_diag;
b_interslice = reconparams.b_interslice;

float delta[10];
float SurrogateCoeff[10];
#pragma vector aligned                                          
for (j = 0; j < 10; j++)
    delta[j] = tempV - neighbors[j];

for (j = 0; j < 10; j++)
    SurrogateCoeff[j] = QGGMRF_SurrogateCoeff(delta[j],reconparams,param_ext);

#pragma vector aligned                                          
for (j = 0; j < 10; j++)
{        
    if (j < 4)
    {
        sum1_Nearest += (SurrogateCoeff[j] * delta[j]);
        sum2_Nearest += SurrogateCoeff[j];
    }
    else if (j >= 4 && j<8)
    {
        sum1_Diag += (SurrogateCoeff[j] * delta[j]);
        sum2_Diag += SurrogateCoeff[j];
    }
    else
    {
        sum1_Interslice += (SurrogateCoeff[j] * delta[j]);
        sum2_Interslice += SurrogateCoeff[j];
    }        

}
float THETA1old = THETA1;
float THETA2old = THETA2; 
THETA1 +=  (b_nearest * sum1_Nearest + b_diag * sum1_Diag + b_interslice * sum1_Interslice) ;
THETA2 +=  (b_nearest * sum2_Nearest + b_diag * sum2_Diag + b_interslice * sum2_Interslice) ;
float change = -THETA1/THETA2;
if ( !isfinite(THETA1) || !isfinite(THETA2) || !isfinite(change)  ) {
    printf("\n\n********************************Big problems with update!***********************************");
    printf("\n NaN observed with update step inside icd3d! change=%f; THETA1old=%f; THETA2old=%f; THETA1=%f; THETA2=%f\n",change,THETA1old,THETA2old,THETA1,THETA2);
    for (j=0; j<10; j++){
        printf("\nSurrogate coeff at %d = %f\n",j,SurrogateCoeff[j]);
    }
    exit(1);
}

return(change);

}

/ the potential function of the QGGMRF prior model. p << q <= 2 / float QGGMRF_Potential(float delta, struct ReconParams reconparams, struct ParamExt param_ext) { float temp, GGMRF_Pot; float p = reconparams.p; float q = reconparams.q; float T = reconparams.T; float SigmaX = reconparams.SigmaX;

GGMRF_Pot = powf(fabs(delta),p)/(p*param_ext.pow_sigmaX_p);
temp = powf(fabs(delta/(T*SigmaX)), q-p);

return ( GGMRF_Pot * temp/(1.0+temp) );

}

/ Quadratic Surrogate Function for the log(prior model) / / For a given convex potential function rho(delta) ... / / The surrogate function defined about a point "delta_p", Q(delta ; delta_p), is given by ... / / Q(delta ; delta_p) = a(delta_p) (delta^2/2), where the coefficient a(delta_p) is ... / / a(delta_p) = [ rho'(delta_p)/delta_p ] ... / / for the case delta_current is Non-Zero and rho' is the 1st derivative of the potential function / / Return this coefficient a(delta_p) / / Prior-model specific, independent of neighborhood */

float QGGMRF_SurrogateCoeff(float delta, struct ReconParams reconparams, struct ParamExt param_ext) { float p, q, T, SigmaX, qmp; float num, denom, temp; float fabs_delta;

p = reconparams.p;
q = reconparams.q;
T = reconparams.T;
SigmaX = reconparams.SigmaX;
float pow_sigmaX_p = param_ext.pow_sigmaX_p;
float pow_sigmaX_q = param_ext.pow_sigmaX_q;
float pow_T_qmp = param_ext.pow_T_qmp;
qmp = q - p;
fabs_delta=(float)fabs(delta);

/* Refer to Chapter 7, MBIR Textbook by Prof Bouman, Page 151 */
/* Table on Quadratic surrogate functions for different prior models */

if (delta == 0.0)
return 2.0/( p*pow_sigmaX_q*pow_T_qmp ) ; /* rho"(0) */

temp = powf(fabs_delta/(T*SigmaX), qmp);
num = (q/p + temp) * powf(fabs_delta,p-2) * temp;
denom = pow_sigmaX_p * (1.0+temp) * (1.0+temp);

return num/denom;

}

void ExtractNeighbors3D( float neighbors, int jx, int jy, float image, struct ImageParams3D imgparams) { int plusx, minusx, plusy, minusy; int Nx,Ny; //int jz,Nz;

Nx = imgparams.Nx;
Ny = imgparams.Ny;
//Nz = imgparams.Nz;

plusx = jx + 1;
plusx = ((plusx < Nx) ? plusx : 0);
minusx = jx - 1;
minusx = ((minusx < 0) ? (Nx-1) : minusx);
plusy = jy + 1;
plusy = ((plusy < Ny) ? plusy : 0);
minusy = jy - 1;
minusy = ((minusy < 0) ? (Ny-1) : minusy);

neighbors[0] = image[jy*Nx+plusx];
neighbors[1] = image[jy*Nx+minusx];
neighbors[2] = image[plusy*Nx+jx];
neighbors[3] = image[minusy*Nx+jx];

neighbors[4] = image[plusy*Nx+plusx];
neighbors[5] = image[plusy*Nx+minusx];
neighbors[6] = image[minusy*Nx+plusx];
neighbors[7] = image[minusy*Nx+minusx];

}

include

include

include

include

include

ifndef MSVC / not included in MS Visual C++ /

#include <sys/time.h>

endif

include "MBIRModularDefs.h"

include "MBIRModularUtils.h"

include "allocate.h"

include "icd3d.h"

include "heap.h"

include "A_comp.h"

include "initialize.h"

include "recon3d.h"

define TEST

//#define COMP_COST //#define COMP_RMSE

/ Internal functions / void super_voxel_recon(int jj,struct SVParams svpar,unsigned long NumUpdates,float totalValue,float totalChange,int iter, char phaseMap,long order,int indexList,float weight,float sinoerr, struct AValues_char *A_Padded_Map,float Aval_max_ptr,struct heap_node headNodeArray, struct SinoParams3DParallel sinoparams,struct ReconParams reconparams,struct ParamExt param_ext,float image, struct ImageParams3D imgparams, float proximalmap, float voxelsBuffer1,float voxelsBuffer2,char group_array,int group_id); void SVproject(float proj,float image,struct AValues_char *A_Padded_Map,float Aval_max_ptr, struct ImageParams3D imgparams,struct SinoParams3DParallel sinoparams,struct SVParams svpar,char backproject_flag); void coordinateShuffle(int order1, int order2,int len); void three_way_shuffle(long order1, char order2, struct heap_node headNodeArray,int len); float MAPCostFunction3D(float x,float e,float w,struct ImageParams3D imgparams,struct SinoParams3DParallel sinoparams, struct ReconParams reconparams,struct ParamExt param_ext);

void MBIRReconstruct( float image, float sino, float weight, float proj_init, float proximalmap, struct ImageParams3D imgparams, struct SinoParams3DParallel sinoparams, struct ReconParams reconparams, char Amatrix_fname, char verboseLevel) { float sinoerr, proximalmap_loc=NULL; int i,j,jj,p,t,iter,it_print=1; size_t k;

ifndef MSVC / not included in MS Visual C++ /

struct timeval tm1,tm2;
#endif

/* image/sino/recon parameters */
int Nx = imgparams.Nx;
int Ny = imgparams.Ny;
int Nz = imgparams.Nz;
int Nxy = Nx*Ny;
int Nvc = sinoparams.NViews * sinoparams.NChannels;
int MaxIterations = reconparams.MaxIterations;
float StopThreshold = reconparams.StopThreshold;

/* Initialize/allocate SV parameters */
struct AValues_char **A_Padded_Map;
float *Aval_max_ptr;
struct SVParams svpar;
initSVParams(&svpar, imgparams, sinoparams);
int Nsv = svpar.Nsv;
int SVLength = svpar.SVLength;
int SV_per_Z = svpar.SV_per_Z;
int SVsPerRow = svpar.SVsPerRow;

/* Activate proximal map mode if given as input */
if(proximalmap != NULL)
{
    reconparams.ReconType = MBIR_MODULAR_RECONTYPE_PandP;
    /* 'image' is reconstructed in place, so if proximal map is the same array, make a local copy */
    if(proximalmap == image)
    {
        proximalmap_loc = (float *) mget_spc((size_t)Nx*Ny*Nz,sizeof(float));
        for(k=0; k<(size_t)Nx*Ny*Nz; k++)
            proximalmap_loc[k] = proximalmap[k];
    }
    else
        proximalmap_loc = proximalmap;
}

/* print summary to stdout */
if(verboseLevel>1)
{
    fprintf(stdout,"MBIRReconstruct() -- build time: %s, %s\n", __DATE__, __TIME__);
    printSinoParams3DParallel(&sinoparams);
    printImageParams3D(&imgparams);
    if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_QGGMRF_3D)
        printReconParamsQGGMRF3D(&reconparams);
    if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_PandP)
        printReconParamsPandP(&reconparams);
}

/* Allocate and generate recon mask based on ROIRadius */
char * ImageReconMask = GenImageReconMask(&imgparams);

/* Read/compute/write System Matrix */
A_Padded_Map = (struct AValues_char **)multialloc(sizeof(struct AValues_char),2,Nsv,(2*SVLength+1)*(2*SVLength+1));
Aval_max_ptr = (float *) mget_spc(Nx*Ny,sizeof(float));
if(Amatrix_fname != NULL)
{
    if(verboseLevel)
        fprintf(stdout,"Reading system matrix...\n");
    readAmatrix(Amatrix_fname, A_Padded_Map, Aval_max_ptr, &imgparams, &sinoparams, svpar);
}
else
{
    if(verboseLevel)
        fprintf(stdout,"Computing system matrix...\n");
    A_comp(A_Padded_Map,Aval_max_ptr,svpar,&sinoparams,ImageReconMask,&imgparams);
}

/* Project image for sinogram error */
if(proj_init != NULL)
    sinoerr = proj_init;
else
{
    if(verboseLevel)
        fprintf(stdout,"Projecting image...\n");
    sinoerr = (float *) mget_spc((size_t)Nz*Nvc,sizeof(float));
    SVproject(sinoerr,image,A_Padded_Map,Aval_max_ptr,imgparams,sinoparams,svpar,0);
}
for(k=0; k<(size_t)Nz*Nvc; k++)
    sinoerr[k] = sino[k]-sinoerr[k];

/* Recon parameters */
NormalizePriorWeights3D(&reconparams);
struct ParamExt param_ext;
param_ext.pow_sigmaX_p = powf(reconparams.SigmaX,reconparams.p);
param_ext.pow_sigmaX_q = powf(reconparams.SigmaX,reconparams.q);
param_ext.pow_T_qmp    = powf(reconparams.T,reconparams.q - reconparams.p);
param_ext.SigmaXsq = reconparams.SigmaX * reconparams.SigmaX;

float *voxelsBuffer1;  /* the first N entries are the voxel values.  */
float *voxelsBuffer2;
unsigned long NumUpdates=0;
float totalValue=0,totalChange=0,equits=0;
float avg_update=0,avg_update_rel=0;
float c_ratio=0.07;
float convergence_rho=0.7;
int rep_num=(int)ceil(1/(4*c_ratio*convergence_rho));
struct heap priorityheap;
initialize_heap(&priorityheap);
long *order;
char *phaseMap, **group_id_list;

int NumMaskVoxels=0;
for(j=0;j<Nxy;j++)
if(ImageReconMask[j])
    NumMaskVoxels++;

#ifdef COMP_RMSE
    struct Image3D Image_ref;
    Image_ref.imgparams.Nx = imgparams.Nx;
    Image_ref.imgparams.Ny = imgparams.Ny;
    Image_ref.imgparams.Nz = imgparams.Nz;
    Image_ref.imgparams.FirstSliceNumber = imgparams.FirstSliceNumber;
    Image_ref.imgparams.NumSliceDigits = imgparams.NumSliceDigits;
    AllocateImageData3D(&Image_ref);
    ReadImage3D("ref/ref",&Image_ref);

    float ** image_ref = Image_ref.image;
    double rms_err=0,rms_val=0;
    int jz, Nz0=0, Nz1=Nz;
    for(jz=Nz0; jz<Nz1; jz++)
    for(j=0; j<Nxy; j++)
    if(ImageReconMask[j]) {
        rms_val += image_ref[jz][j]*image_ref[jz][j];
        rms_err += (image[(size_t)jz*Nxy+j]-image_ref[jz][j])*(image[(size_t)jz*Nxy+j]-image_ref[jz][j]);
    }
    rms_val = sqrt(rms_val/((float)NumMaskVoxels*(Nz1-Nz0)));
    rms_err = sqrt(rms_err/((float)NumMaskVoxels*(Nz1-Nz0)));
    FILE *fp_mse=fopen("rmse.txt","w");
    fprintf(fp_mse,"equits|rms_err|rms_val|rms_err/rms_val\n");
    fprintf(fp_mse,"%.2f %g %g %g\n",equits,rms_err,rms_val,rms_err/rms_val);
#endif

order = (long *) mget_spc(Nsv*SV_per_Z,sizeof(long));
phaseMap = (char *) mget_spc(Nsv*SV_per_Z,sizeof(char));
group_id_list = (char **) multialloc(sizeof(char),2,SV_per_Z,4);

/* Order of pixel updates need NOT be raster order, just initialize */
t=0;
for(p=0;p<Nz;p+=svpar.SVDepth)
for(i=0;i<Ny;i+=(SVLength*2-svpar.overlap))
for(j=0;j<Nx;j+=(SVLength*2-svpar.overlap))
{
    order[t]=(long)p*Nxy+i*Nx+j;  /* order is the first voxel coordinate, not the center */
    t++;
}

#pragma omp parallel for private(jj) schedule(dynamic)
for(i=0;i<SV_per_Z;i++)
for(jj=0;jj<Nsv;jj++)
{
    if((jj/SVsPerRow)%2==0)
    {
        if((jj%SVsPerRow)%2==0)
            phaseMap[i*Nsv+jj]=0;
        else
            phaseMap[i*Nsv+jj]=1;
    }
    else
    {
        if((jj%SVsPerRow)%2==0)
            phaseMap[i*Nsv+jj]=2;
        else
            phaseMap[i*Nsv+jj]=3;
    }
}

for(i=0;i<SV_per_Z;i++) {
    if(i%4==0){
        group_id_list[i][0]=0;
        group_id_list[i][1]=3;
        group_id_list[i][2]=1;
        group_id_list[i][3]=2;
    }
    else if(i%4==1) {
        group_id_list[i][0]=3;
        group_id_list[i][1]=0;
        group_id_list[i][2]=2;
        group_id_list[i][3]=1;
    }
    else if(i%4==2) {
        group_id_list[i][0]=1;
        group_id_list[i][1]=2;
        group_id_list[i][2]=3;
        group_id_list[i][3]=0;
    }
    else{
        group_id_list[i][0]=2;
        group_id_list[i][1]=1;
        group_id_list[i][2]=0;
        group_id_list[i][3]=3;
    }
}
#ifdef TEST
srand(0);
#else
srand(time(NULL));
#endif

struct heap_node *headNodeArray;
headNodeArray = (struct heap_node *) mget_spc(Nsv*SV_per_Z,sizeof(struct heap_node));

for(i=0;i<SV_per_Z;i++)
for(jj=0;jj<Nsv;jj++)
{
    headNodeArray[i*Nsv+jj].pt=i*Nsv+jj;
    headNodeArray[i*Nsv+jj].x=0.0;
}
int indexList_size=(int) Nsv*SV_per_Z*4*c_ratio*(1-convergence_rho);
int * indexList = (int *) mget_spc(indexList_size,sizeof(int));

#ifdef ICC
    voxelsBuffer1 = (float *)_mm_malloc(Nxy*sizeof(float),64);
    voxelsBuffer2 = (float *)_mm_malloc(Nxy*sizeof(float),64);
#else
    voxelsBuffer1 = (float *) malloc(Nxy*sizeof(float));
    voxelsBuffer2 = (float *) malloc(Nxy*sizeof(float));
    //voxelsBuffer1 = (float *) aligned_alloc(64,Nxy*sizeof(float));
    //voxelsBuffer2 = (float *) aligned_alloc(64,Nxy*sizeof(float));
    //voxelsBuffer1 = (float *) _aligned_malloc(Nxy*sizeof(float),64);
    //voxelsBuffer2 = (float *) _aligned_malloc(Nxy*sizeof(float),64);
#endif

for(i=0;i<Nxy;i++) {
    voxelsBuffer1[i]=0;
    voxelsBuffer2[i]=0;
}

//coordinateShuffle(&order[0],&phaseMap[0],Nsv*SV_per_Z);
long tmp_long;
char tmp_char;
for(i=0; i<Nsv*SV_per_Z-1; i++)
{
    j = i + (rand() % (Nsv*SV_per_Z-i));
    tmp_long = order[j];
    order[j] = order[i];
    order[i] = tmp_long;
    tmp_char = phaseMap[j];
    phaseMap[j] = phaseMap[i];
    phaseMap[i] = tmp_char;
}

iter=0;
char stop_FLAG=0;
int startIndex=0;
int endIndex=0;

if(verboseLevel) {
    fprintf(stdout,"Reconstructing...\n");
    #ifndef MSVC    /* not included in MS Visual C++ */
    gettimeofday(&tm1,NULL);
    #endif
}

#pragma omp parallel
{
    while(stop_FLAG==0 && equits<MaxIterations && iter<100*MaxIterations)
    {
        #pragma omp single
        {
            if(iter==0)
            {
                startIndex=0;
                endIndex=Nsv*SV_per_Z;
            }
            else
            {
                if((iter-1)%(2*rep_num)==0 && iter!=1)
                    three_way_shuffle(&order[0],&phaseMap[0],&headNodeArray[0],Nsv*SV_per_Z);

                if(iter%2==1)
                {
                    priorityheap.size=0;
                    for(jj=0;jj<Nsv*SV_per_Z;jj++){
                        heap_insert(&priorityheap, &(headNodeArray[jj]));
                    }
                    startIndex=0;
                    endIndex=indexList_size;

                    for(i=0;i<endIndex;i++) {
                        struct heap_node tempNode;
                        get_heap_max(&priorityheap, &tempNode);
                        indexList[i]=tempNode.pt;
                    }
                }
                else {
                    startIndex=((iter-2)/2)%rep_num*Nsv*SV_per_Z/rep_num;
                    endIndex=(((iter-2)/2)%rep_num+1)*Nsv*SV_per_Z/rep_num;
                }
            }
        }

        int group=0;

        for (group = 0; group < 4; group++)
        {
            #pragma omp for schedule(dynamic)  reduction(+:NumUpdates) reduction(+:totalValue) reduction(+:totalChange)
            for (jj = startIndex; jj < endIndex; jj+=1)
                super_voxel_recon(jj,svpar,&NumUpdates,&totalValue,&totalChange,iter,
                        &phaseMap[0],order,&indexList[0],weight,sinoerr,A_Padded_Map,&Aval_max_ptr[0],
                        &headNodeArray[0],sinoparams,reconparams,param_ext,image,imgparams,proximalmap_loc,
                        voxelsBuffer1,voxelsBuffer2,&group_id_list[0][0],group);
        }

        #pragma omp single
        {
            avg_update=avg_update_rel=0.0;
            if(NumUpdates>0) {
                avg_update = totalChange/NumUpdates;
                float avg_value = totalValue/NumUpdates;
                avg_update_rel = avg_update/avg_value * 100;
                //printf("avg_update %f, avg_value %f, avg_update_rel %f\n",avg_update,avg_value,avg_update_rel);
            }
            #ifdef COMP_COST
            float cost = MAPCostFunction3D(image,sinoerr,weight,imgparams,sinoparams,reconparams,param_ext);
            fprintf(stdout, "it %d cost = %-15f, avg_update %f \n", iter, cost, avg_update);
            #endif

            if (avg_update_rel < StopThreshold && (endIndex!=0))
                stop_FLAG = 1;

            iter++;
            equits += (float)NumUpdates/((float)NumMaskVoxels*Nz);

            if(verboseLevel && equits > it_print) {
                fprintf(stdout,"\titeration %d, average change %.4f %%\n",it_print,avg_update_rel);
                it_print++;
            }

            #ifdef COMP_RMSE
                rms_err=0;
                for(jz=Nz0; jz<Nz1; jz++)
                for(j=0; j<Nxy; j++)
                if(ImageReconMask[j])
                    rms_err += (image[(size_t)jz*Nxy+j]-image_ref[jz][j])*(image[(size_t)jz*Nxy+j]-image_ref[jz][j]);
                rms_err = sqrt(rms_err/((float)NumMaskVoxels*(Nz1-Nz0)));
                fprintf(fp_mse,"%.2f %g %g %g\n",equits,rms_err,rms_val,rms_err/rms_val);
            #endif

            NumUpdates=0;
            totalValue=0;
            totalChange=0;
        }
    }
}

if(verboseLevel)
{
    if(StopThreshold <= 0)
        fprintf(stdout,"\tNo stopping condition--running fixed iterations\n");
    else if(stop_FLAG == 1)
        fprintf(stdout,"\tReached stopping condition\n");
    else
        fprintf(stdout,"\tWarning: Didn't reach stopping condition\n");
    if(verboseLevel>1)
    {
        fprintf(stdout,"\tEquivalent iterations = %.1f, (non-homogeneous iterations = %d)\n",equits,iter);
        fprintf(stdout,"\tAverage update in last iteration (relative) = %f %%\n",avg_update_rel);
        fprintf(stdout,"\tAverage update in last iteration (magnitude) = %.4g\n",avg_update);
    }
    #ifndef MSVC    /* not included in MS Visual C++ */
    gettimeofday(&tm2,NULL);
    unsigned long long tt = 1000 * (tm2.tv_sec - tm1.tv_sec) + (tm2.tv_usec - tm1.tv_usec) / 1000;
    printf("\tReconstruction time = %llu ms (iterations only)\n", tt);
    #endif
}

/* If initial projection was supplied, update to return final projection */
if(proj_init != NULL)
{
    for(k=0; k<(size_t)Nz*Nvc; k++)
        proj_init[k] = sino[k]-sinoerr[k];
}
else
    free((void *)sinoerr);

/* If local copy of proximal map was made, free it */
if(proximalmap == image)
    free((void *)proximalmap_loc);

#ifdef ICC
    _mm_free((void *)voxelsBuffer1);
    _mm_free((void *)voxelsBuffer2);
#else
    free((void *)voxelsBuffer1);
    free((void *)voxelsBuffer2);
    //_aligned_free((void *)voxelsBuffer1);
    //_aligned_free((void *)voxelsBuffer2);
#endif

free((void *)headNodeArray);
free_heap((void *)&priorityheap);
free((void *)order);
free((void *)phaseMap);
multifree(group_id_list,2);
free((void *)indexList);

#ifdef COMP_RMSE
    FreeImageData3D(&Image_ref);
    fclose(fp_mse);
#endif

/* Free SV memory */
for(i=0;i<Nsv;i++) {
    free((void *)svpar.bandMinMap[i].bandMin);
    free((void *)svpar.bandMaxMap[i].bandMax);
}
free((void *)svpar.bandMinMap);
free((void *)svpar.bandMaxMap);

/* Free system matrix */
for(i=0;i<Nsv;i++)
for(j=0;j<(2*SVLength+1)*(2*SVLength+1);j++)
if(A_Padded_Map[i][j].length>0)
{
    free((void *)A_Padded_Map[i][j].val);
    free((void *)A_Padded_Map[i][j].pieceWiseMin);
    free((void *)A_Padded_Map[i][j].pieceWiseWidth);
}
multifree(A_Padded_Map,2);
free((void *)Aval_max_ptr);
free((void *)ImageReconMask);

} / END MBIRReconstruct() /

void super_voxel_recon( int jj, struct SVParams svpar, unsigned long NumUpdates, float totalValue, float totalChange, int iter, char phaseMap, long order, int indexList, float weight, float sinoerr, struct AValues_char * A_Padded_Map, float Aval_max_ptr, struct heap_node headNodeArray, struct SinoParams3DParallel sinoparams, struct ReconParams reconparams, struct ParamExt param_ext, float image, struct ImageParams3D imgparams, float proximalmap, float voxelsBuffer1, float voxelsBuffer2, char group_array, int group_id) { int jy,jx,p,i,q,t,j,currentSlice,startSlice; int SV_depth_modified; int NumUpdates_loc=0; float totalValue_loc=0,totalChange_loc=0;

int Nx = imgparams.Nx;
int Ny = imgparams.Ny;
int Nz = imgparams.Nz;
int Nxy = Nx*Ny;
int Nvc = sinoparams.NViews * sinoparams.NChannels;
char PositivityFlag = reconparams.Positivity;

int SVLength = svpar.SVLength;
int overlappingDistance = svpar.overlap;
int SV_depth = svpar.SVDepth;
int SVsPerRow = svpar.SVsPerRow;
struct minStruct * bandMinMap = svpar.bandMinMap;
struct maxStruct * bandMaxMap = svpar.bandMaxMap;
int pieceLength = svpar.pieceLength;
int NViewSets = sinoparams.NViews/pieceLength;

int jj_new;
if(iter%2==0)
    jj_new=jj;
else
    jj_new=indexList[jj];

startSlice = order[jj_new] / Nx / Ny;
jy = (order[jj_new] - startSlice* Nx * Ny) / Nx;
jx = (order[jj_new] - startSlice* Nx * Ny) % Nx;

if(phaseMap[jj_new]!=group_array[startSlice/SV_depth*4+group_id])
    return;

if((startSlice+SV_depth)>Nz)
    SV_depth_modified=Nz-startSlice;
else
    SV_depth_modified=SV_depth;

int SVPosition=jy/(2*SVLength-overlappingDistance)*SVsPerRow+jx/(2*SVLength-overlappingDistance);

int countNumber=0;  /*XW: the number of voxels chosen for a certain radius of circle*/
int radius =SVLength;   /*XW: choose the circle radius*/
int coordinateSize=1;   /*XW: CoordinateSize is the size of a minimum square enclosing the circle. For the baseline code, coordinateSize=1 because only 1 voxel*/
/*XW: imagine that this is a minimum square enclosing the circle. The square
 * "touches" the circle on 4 coordinate points. CoordianteSize is the possible
 * maximum number of voxels for a certain radius of circle */
if(radius!=0)
    coordinateSize=(2*radius+1)*(2*radius+1);
int * k_newCoordinate = (int *) mget_spc(coordinateSize,sizeof(int));
int * j_newCoordinate = (int *) mget_spc(coordinateSize,sizeof(int));
int j_newAA=0;
int k_newAA=0;
int voxelIncrement=0;

/*XW: choosing the voxels locations in a circle*/
for(j_newAA=jy;j_newAA<=(jy+2*radius);j_newAA++)
for(k_newAA=jx;k_newAA<=(jx+2*radius);k_newAA++)
{
    if(j_newAA>=0 && k_newAA >=0 && j_newAA <Ny && k_newAA < Nx)
    {
        if(A_Padded_Map[SVPosition][voxelIncrement].length >0) {
            j_newCoordinate[countNumber]=j_newAA;
            k_newCoordinate[countNumber]=k_newAA;
            countNumber++;
        }
    }
    voxelIncrement++;
}

/*XW: if no voxel chosen, we skip this loop iteration*/
if(countNumber==0)
{
    free((void *)k_newCoordinate);
    free((void *)j_newCoordinate);
    return;
}

coordinateShuffle(&j_newCoordinate[0],&k_newCoordinate[0],countNumber);

/*XW: for a supervoxel, bandMin records the starting position of the sinogram band at each view*/
/*XW: for a supervoxel, bandMax records the end position of the sinogram band at each view */
channel_t * bandMin = (channel_t *) mget_spc(sinoparams.NViews,sizeof(channel_t));
channel_t * bandMax = (channel_t *) mget_spc(sinoparams.NViews,sizeof(channel_t));
channel_t * bandWidthTemp = (channel_t *) mget_spc(sinoparams.NViews,sizeof(channel_t));
channel_t * bandWidth = (channel_t *) mget_spc(NViewSets,sizeof(channel_t));

#ifdef ICC
_intel_fast_memcpy(&bandMin[0],&bandMinMap[SVPosition].bandMin[0],sizeof(channel_t)*(sinoparams.NViews));
_intel_fast_memcpy(&bandMax[0],&bandMaxMap[SVPosition].bandMax[0],sizeof(channel_t)*(sinoparams.NViews));
#else
memcpy(&bandMin[0],&bandMinMap[SVPosition].bandMin[0],sizeof(channel_t)*(sinoparams.NViews));
memcpy(&bandMax[0],&bandMaxMap[SVPosition].bandMax[0],sizeof(channel_t)*(sinoparams.NViews));
#endif

//#pragma vector aligned
for(p=0;p< sinoparams.NViews;p++)
    bandWidthTemp[p]=bandMax[p]-bandMin[p];

for (p = 0; p < NViewSets; p++)
{
    int bandWidthMax=bandWidthTemp[p*pieceLength];
    for(t=0;t<pieceLength;t++){
        if(bandWidthTemp[p*pieceLength+t]>bandWidthMax)
            bandWidthMax=bandWidthTemp[p*pieceLength+t];
    }
    bandWidth[p]=bandWidthMax;
}

float ** newWArray = (float **)malloc(sizeof(float *) * NViewSets);
float ** newEArray = (float **)malloc(sizeof(float *) * NViewSets);
float ** CopyNewEArray = (float **)malloc(sizeof(float *) * NViewSets);

for (p = 0; p < NViewSets; p++) {
    newWArray[p] = (float *)malloc(sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
    newEArray[p] = (float *)malloc(sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
    CopyNewEArray[p] = (float *)malloc(sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
}

float *newWArrayPointer=&newWArray[0][0];
float *newEArrayPointer=&newEArray[0][0];

/*XW: copy the interlaced we into the memory buffer*/
for (p = 0; p < NViewSets; p++)
{
    newWArrayPointer=&newWArray[p][0];
    newEArrayPointer=&newEArray[p][0];
    for(i=0;i<SV_depth_modified;i++)
    for(q=0;q<pieceLength;q++)
    {
        #ifdef ICC
        _intel_fast_memcpy(newWArrayPointer,&weight[(startSlice+i)*Nvc+p*pieceLength*sinoparams.NChannels+q*sinoparams.NChannels+bandMin[p*pieceLength+q]],sizeof(float)*(bandWidth[p]));
        _intel_fast_memcpy(newEArrayPointer,&sinoerr[(startSlice+i)*Nvc+p*pieceLength*sinoparams.NChannels+q*sinoparams.NChannels+bandMin[p*pieceLength+q]],sizeof(float)*(bandWidth[p]));
        #else
        memcpy(newWArrayPointer,&weight[(startSlice+i)*Nvc+p*pieceLength*sinoparams.NChannels+q*sinoparams.NChannels+bandMin[p*pieceLength+q]],sizeof(float)*(bandWidth[p]));
        memcpy(newEArrayPointer,&sinoerr[(startSlice+i)*Nvc+p*pieceLength*sinoparams.NChannels+q*sinoparams.NChannels+bandMin[p*pieceLength+q]],sizeof(float)*(bandWidth[p]));
        #endif
        newWArrayPointer+=bandWidth[p];
        newEArrayPointer+=bandWidth[p];
    }
}

for (p = 0; p < NViewSets; p++)
{
    #ifdef ICC
    _intel_fast_memcpy(&CopyNewEArray[p][0],&newEArray[p][0],sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
    #else
    memcpy(&CopyNewEArray[p][0],&newEArray[p][0],sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
    #endif
}

float ** newWArrayTransposed = (float **)malloc(sizeof(float *) * NViewSets);
float ** newEArrayTransposed = (float **)malloc(sizeof(float *) * NViewSets);

for (p = 0; p < NViewSets; p++)
{
    newWArrayTransposed[p] = (float *)malloc(sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
    newEArrayTransposed[p] = (float *)malloc(sizeof(float)*bandWidth[p]*pieceLength*SV_depth_modified);
}

float *WTransposeArrayPointer=&newWArrayTransposed[0][0];
float *ETransposeArrayPointer=&newEArrayTransposed[0][0];

for (p = 0; p < NViewSets; p++)
for(currentSlice=0;currentSlice<(SV_depth_modified);currentSlice++) 
{
    WTransposeArrayPointer=&newWArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
    ETransposeArrayPointer=&newEArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
    newEArrayPointer=&newEArray[p][currentSlice*bandWidth[p]*pieceLength];
    newWArrayPointer=&newWArray[p][currentSlice*bandWidth[p]*pieceLength];
    for(q=0;q<bandWidth[p];q++)
    {
        #pragma vector aligned
        for(t=0;t<pieceLength;t++)
        {
            ETransposeArrayPointer[q*pieceLength+t]=newEArrayPointer[bandWidth[p]*t+q];
            WTransposeArrayPointer[q*pieceLength+t]=newWArrayPointer[bandWidth[p]*t+q];
        }
    }
}

WTransposeArrayPointer=&newWArrayTransposed[0][0];
ETransposeArrayPointer=&newEArrayTransposed[0][0];
newEArrayPointer=&newEArray[0][0];

for (p = 0; p < NViewSets; p++)
    free((void *)newWArray[p]);

free((void **)newWArray);

/* Turn off zero-skipping for 1st iteration */
char zero_skip_enable=0;  // 1: enable, 0: disable
if(iter>0 && PositivityFlag)
    zero_skip_enable=1;

/*XW: the start of the loop to compute theta1, theta2*/
float * tempV = (float *) mget_spc(SV_depth_modified,sizeof(float));
float * tempProxMap = (float *) mget_spc(SV_depth_modified,sizeof(float));
float ** neighbors = (float **) multialloc(sizeof(float),2,SV_depth_modified,10);
char * zero_skip_FLAG = (char *) mget_spc(SV_depth_modified,sizeof(char));
float * diff = (float *) mget_spc(SV_depth_modified,sizeof(float));
float * THETA1 = (float *) get_spc(SV_depth_modified,sizeof(float));
float * THETA2 = (float *) get_spc(SV_depth_modified,sizeof(float));

for(i=0;i<countNumber;i++)
{
    const short j_new = j_newCoordinate[i];   /*XW: get the voxel's x,y location*/
    const short k_new = k_newCoordinate[i];
    float Aval_max = Aval_max_ptr[j_new*Nx+k_new];

    for(p=0;p<SV_depth_modified;p++)
        THETA1[p]=THETA2[p]=0.0;

    int theVoxelPosition=(j_new-jy)*(2*SVLength+1)+(k_new-jx);
    unsigned char * A_padd_Tranpose_pointer = &A_Padded_Map[SVPosition][theVoxelPosition].val[0];

    for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
    {
        tempV[currentSlice] = (float)(image[(size_t)(startSlice+currentSlice)*Nxy + j_new*Nx+k_new]); /* current voxel value */

        zero_skip_FLAG[currentSlice] = 0;

        if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_QGGMRF_3D)
        {
            ExtractNeighbors3D(&neighbors[currentSlice][0],k_new,j_new,&image[(size_t)(startSlice+currentSlice)*Nxy],imgparams);

            if((startSlice+currentSlice)==0)
                neighbors[currentSlice][8]=voxelsBuffer1[j_new*Nx+k_new];
            else
                neighbors[currentSlice][8]=image[(size_t)(startSlice+currentSlice-1)*Nxy + j_new*Nx+k_new];

            if((startSlice+currentSlice)<(Nz-1))
                neighbors[currentSlice][9]=image[(size_t)(startSlice+currentSlice+1)*Nxy + j_new*Nx+k_new];
            else
                neighbors[currentSlice][9]=voxelsBuffer2[j_new*Nx+k_new];

            if(zero_skip_enable)
            if(tempV[currentSlice] == 0.0)
            {
                zero_skip_FLAG[currentSlice] = 1;
                for (j = 0; j < 10; j++)
                {
                    if (neighbors[currentSlice][j] != 0.0)
                    {
                        zero_skip_FLAG[currentSlice] = 0;
                        break;
                    }
                }
            }
        }
        if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_PandP)
            tempProxMap[currentSlice] = proximalmap[(startSlice+currentSlice)*Nxy + j_new*Nx+k_new];
    }

    A_padd_Tranpose_pointer = &A_Padded_Map[SVPosition][theVoxelPosition].val[0];
    for(p=0;p<NViewSets;p++)
    {
        const int myCount=A_Padded_Map[SVPosition][theVoxelPosition].pieceWiseWidth[p];
        const int pieceMin=A_Padded_Map[SVPosition][theVoxelPosition].pieceWiseMin[p];
        #pragma vector aligned
        for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
        if(zero_skip_FLAG[currentSlice] == 0)
        {
            WTransposeArrayPointer=&newWArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
            ETransposeArrayPointer=&newEArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
            WTransposeArrayPointer+=pieceMin*pieceLength;
            ETransposeArrayPointer+=pieceMin*pieceLength;
            float tempTHETA1=0.0;
            float tempTHETA2=0.0;
            //Not finding evidence this makes a difference --SJK
            //Deprecated by Intel anyway
            //#pragma vector aligned
            //#pragma simd reduction(+:tempTHETA2,tempTHETA1)
            for(t=0;t<myCount*pieceLength;t++)
            {   /* summing over voxels which are not skipped or masked*/
                tempTHETA1 += A_padd_Tranpose_pointer[t]*WTransposeArrayPointer[t]*ETransposeArrayPointer[t];
                tempTHETA2 += A_padd_Tranpose_pointer[t]*WTransposeArrayPointer[t]*A_padd_Tranpose_pointer[t];
                if ( !isfinite(tempTHETA1) || !isfinite(tempTHETA2)) {
                    printf("\n\n********************************Big problems with update!***********************************");
                    printf("\n NaN observed when calculating tempTHETA1 in  recon3d !!!!!!!!"
                           "\n t=%d, A_padd_Transpose=%d, WTransposeArray=%f," 
                           "\n THETA1=%f; THETA2=%f, tempTHETA1=%f; tempTHETA2=%f\n", 
                            t, A_padd_Tranpose_pointer[t], WTransposeArrayPointer[t], 
                            THETA1[currentSlice],THETA2[currentSlice],tempTHETA1,tempTHETA2);
                    exit(1);
                }

            }
            THETA1[currentSlice]+=tempTHETA1;
            THETA2[currentSlice]+=tempTHETA2;
        }
        A_padd_Tranpose_pointer += myCount*pieceLength;
    }

    for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
    {
        THETA1[currentSlice]=-THETA1[currentSlice]*Aval_max*(1.0/255);
        THETA2[currentSlice]=THETA2[currentSlice]*Aval_max*(1.0/255)*Aval_max*(1.0/255);
    }

    ETransposeArrayPointer=&newEArrayTransposed[0][0];

    A_padd_Tranpose_pointer = &A_Padded_Map[SVPosition][theVoxelPosition].val[0];

    for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
    if(zero_skip_FLAG[currentSlice] == 0)
    {
        float pixel,step;
        if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_QGGMRF_3D)
        {
            step = QGGMRF3D_Update(reconparams,param_ext,tempV[currentSlice],&neighbors[currentSlice][0],THETA1[currentSlice],THETA2[currentSlice]);
        }
        else if(reconparams.ReconType == MBIR_MODULAR_RECONTYPE_PandP)
        {
            step = PandP_Update(reconparams,param_ext,tempV[currentSlice],tempProxMap[currentSlice],THETA1[currentSlice],THETA2[currentSlice]);
        }
        else
        {
            fprintf(stderr,"Error** Unrecognized ReconType in ICD update\n");
            exit(-1);
        }

        pixel = tempV[currentSlice] + step;  /* can apply over-relaxation to the step size here */

        if(PositivityFlag)
            image[(size_t)(startSlice+currentSlice)*Nxy + j_new*Nx+k_new] = ((pixel < 0.0) ? 0.0 : pixel);
        else
            image[(size_t)(startSlice+currentSlice)*Nxy + j_new*Nx+k_new] = pixel;

        diff[currentSlice] = image[(size_t)(startSlice+currentSlice)*Nxy + j_new*Nx+k_new] - tempV[currentSlice];

        totalChange_loc += fabs(diff[currentSlice]);
        totalValue_loc += fabs(tempV[currentSlice]);
        NumUpdates_loc++;

        diff[currentSlice]=diff[currentSlice]*Aval_max*(1.0/255);
    }

    for(p=0;p<NViewSets;p++)
    {
        const int myCount=A_Padded_Map[SVPosition][theVoxelPosition].pieceWiseWidth[p];
        const int pieceMin=A_Padded_Map[SVPosition][theVoxelPosition].pieceWiseMin[p];
        #pragma vector aligned
        for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
        if(diff[currentSlice]!=0 && zero_skip_FLAG[currentSlice] == 0)
        {
            ETransposeArrayPointer=&newEArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
            ETransposeArrayPointer+=pieceMin*pieceLength;

            #pragma vector aligned
            for(t=0;t<(myCount*pieceLength);t++)
                ETransposeArrayPointer[t]= ETransposeArrayPointer[t]-A_padd_Tranpose_pointer[t]*diff[currentSlice];
        }
        A_padd_Tranpose_pointer+=myCount*pieceLength;
    }
}

for (p = 0; p < NViewSets; p++)
    free((void *)newWArrayTransposed[p]);

free((void **)newWArrayTransposed);

headNodeArray[jj_new].x=totalChange_loc;

for (p = 0; p < NViewSets; p++)
for(currentSlice=0;currentSlice<SV_depth_modified;currentSlice++)
{
    ETransposeArrayPointer=&newEArrayTransposed[p][currentSlice*bandWidth[p]*pieceLength];
    newEArrayPointer=&newEArray[p][currentSlice*bandWidth[p]*pieceLength];
    for(q=0;q<bandWidth[p];q++)
    {
        #pragma vector aligned
        for(t=0;t<pieceLength;t++)
            newEArrayPointer[bandWidth[p]*t+q]=ETransposeArrayPointer[q*pieceLength+t];
    }
}

for (p = 0; p < NViewSets; p++)
    free((void *)newEArrayTransposed[p]);

free((void **)newEArrayTransposed);

newEArrayPointer=&newEArray[0][0];
float* CopyNewEArrayPointer=&CopyNewEArray[0][0];
float* eArrayPointer=&sinoerr[0];

for (p = 0; p < NViewSets; p++)      /*XW: update the error term in the memory buffer*/
{
    newEArrayPointer=&newEArray[p][0];
    CopyNewEArrayPointer=&CopyNewEArray[p][0];
    for (currentSlice=0; currentSlice< SV_depth_modified;currentSlice++)
    {
        //#pragma vector aligned
        for(q=0;q<pieceLength;q++)
        {
            eArrayPointer=&sinoerr[(startSlice+currentSlice)*Nvc+p*pieceLength*sinoparams.NChannels+q*sinoparams.NChannels+bandMin[p*pieceLength+q]];
            for(t=0;t<bandWidth[p];t++)
            {
                #pragma omp atomic
                *eArrayPointer += (*newEArrayPointer)-(*CopyNewEArrayPointer);
                newEArrayPointer++;
                CopyNewEArrayPointer++;
                eArrayPointer++;
            }
        }
    }
}

for (p = 0; p < NViewSets; p++)
{
    free((void *)newEArray[p]);
    free((void *)CopyNewEArray[p]);
}
free((void **)newEArray);
free((void **)CopyNewEArray);

free((void *)k_newCoordinate);
free((void *)j_newCoordinate);
free((void *)bandMin);
free((void *)bandMax);
free((void *)bandWidth);
free((void *)bandWidthTemp);
free((void *)tempV);
free((void *)tempProxMap);
multifree(neighbors,2);
free((void *)zero_skip_FLAG);
free((void *)diff);
free((void *)THETA1);
free((void *)THETA2);

*NumUpdates += NumUpdates_loc;
*totalValue += totalValue_loc;
*totalChange += totalChange_loc;

} / END super_voxel_recon() /

void coordinateShuffle(int order1, int order2,int len) { int i, j, tmp1,tmp2;

for (i = 0; i < len-1; i++)
{
    j = i + (rand() % (len-i));
    tmp1 = order1[j];
    tmp2 = order2[j];
    order1[j] = order1[i];
    order2[j] = order2[i];
    order1[i] = tmp1;
    order2[i] = tmp2;
}

}

void three_way_shuffle(long order1, char order2, struct heap_node *headNodeArray, int len) { int i,j; long tmp_long; char tmp_char; float temp_x;

for (i = 0; i < len-1; i++)
{
    j = i + (rand() % (len-i));
    tmp_long = order1[j];
    order1[j] = order1[i];
    order1[i] = tmp_long;
    tmp_char = order2[j];
    order2[j] = order2[i];
    order2[i] = tmp_char;
    temp_x=headNodeArray[j].x;
    headNodeArray[j].x=headNodeArray[i].x;
    headNodeArray[i].x=temp_x;
}

}

float MAPCostFunction3D( float x, float e, float *w, struct ImageParams3D imgparams, struct SinoParams3DParallel sinoparams, struct ReconParams reconparams, struct ParamExt param_ext) { int i, M, j, jx, jy, jz, Nx, Ny, Nz, Nxy, plusx, minusx, plusy, plusz; float nloglike, nlogprior_nearest, nlogprior_diag, nlogprior_interslice, x0;

M = sinoparams.NViews * sinoparams.NChannels ;
Nx = imgparams.Nx;
Ny = imgparams.Ny;
Nz = imgparams.Nz;
Nxy = Nx*Ny;

nloglike = 0.0;
for (i = 0; i <sinoparams.NSlices; i++)
for (j = 0; j < M; j++)
    nloglike += e[i*M+j]*w[i*M+j]*e[i*M+j];

nloglike /= 2.0;
nlogprior_nearest = 0.0;
nlogprior_diag = 0.0;
nlogprior_interslice = 0.0;

for (jz = 0; jz < Nz; jz++)
for (jy = 0; jy < Ny; jy++)
for (jx = 0; jx < Nx; jx++)
{
    plusx = jx + 1;
    plusx = ((plusx < Nx) ? plusx : 0);
    minusx = jx - 1;
    minusx = ((minusx < 0) ? Nx-1 : minusx);
    plusy = jy + 1;
    plusy = ((plusy < Ny) ? plusy : 0);
    plusz = jz + 1;
    plusz = ((plusz < Nz) ? plusz : 0);

    j = jy*Nx + jx;
    x0 = x[jz*Nxy+j];

    nlogprior_nearest += QGGMRF_Potential((x0-x[jz*Nxy+jy*Nx+plusx]),reconparams,param_ext);
    nlogprior_nearest += QGGMRF_Potential((x0-x[jz*Nxy+plusy*Nx+jx]),reconparams,param_ext);
    nlogprior_diag += QGGMRF_Potential((x0-x[jz*Nxy+plusy*Nx+minusx]),reconparams,param_ext);
    nlogprior_diag += QGGMRF_Potential((x0-x[jz*Nxy+plusy*Nx+plusx]),reconparams,param_ext);
    nlogprior_interslice += QGGMRF_Potential((x0-x[plusz*Nxy+jy*Nx+jx]),reconparams,param_ext);
}

return (nloglike + reconparams.b_nearest * nlogprior_nearest + reconparams.b_diag * nlogprior_diag + reconparams.b_interslice * nlogprior_interslice) ;

}

/ Forward projection using input SV system matrix /

void SVproject( float proj, float image, struct AValues_char *A_Padded_Map, float Aval_max_ptr, struct ImageParams3D imgparams, struct SinoParams3DParallel sinoparams, struct SVParams svpar, char backproject_flag) { size_t i; int jz; int Nx = imgparams.Nx; int Ny = imgparams.Ny; int Nz = imgparams.Nz; int NChannels = sinoparams.NChannels; int Nvc = sinoparams.NViews sinoparams.NChannels; int SVLength = svpar.SVLength; int pieceLength = svpar.pieceLength; int SVsPerRow = svpar.SVsPerRow; int NViewSets = sinoparams.NViews/pieceLength; struct minStruct bandMinMap = svpar.bandMinMap;

/* initialize output */
if(backproject_flag)
    for (i = 0; i < (size_t)Nx*Ny*Nz; i++)
        image[i] = 0.0;
else
    for (i = 0; i < (size_t)Nvc*Nz; i++)
        proj[i] = 0.0;

#pragma omp parallel for schedule(dynamic)
for(jz=0;jz<Nz;jz++)
{
    int jx,jy,k,r,p;

    for (jy = 0; jy < Ny; jy++)
    for (jx = 0; jx < Nx; jx++)
    {
        int SV_ind_y = jy/(2*SVLength-svpar.overlap);
        int SV_ind_x = jx/(2*SVLength-svpar.overlap);
        int SVPosition = SV_ind_y*SVsPerRow + SV_ind_x;

        int SV_jy = SV_ind_y*(2*SVLength-svpar.overlap);
        int SV_jx = SV_ind_x*(2*SVLength-svpar.overlap);
        int VoxelPosition = (jy-SV_jy)*(2*SVLength+1)+(jx-SV_jx);

        // The second condition should always be true
        if (A_Padded_Map[SVPosition][VoxelPosition].length > 0 && VoxelPosition < ((2*SVLength+1)*(2*SVLength+1)))
        {
            unsigned char* A_padd_Tr_ptr = &A_Padded_Map[SVPosition][VoxelPosition].val[0];
            float rescale = Aval_max_ptr[jy*Nx+jx]*(1.0/255);
            size_t image_idx = (size_t)jz*Nx*Ny + jy*Nx + jx;
            float xval = image[image_idx];

            for(p=0;p<NViewSets;p++)
            {
                int myCount = A_Padded_Map[SVPosition][VoxelPosition].pieceWiseWidth[p];
                int pieceWiseMin = A_Padded_Map[SVPosition][VoxelPosition].pieceWiseMin[p];
                int position = p*pieceLength*NChannels + pieceWiseMin;

                for(r=0;r<myCount;r++)
                for(k=0;k<pieceLength;k++)
                {
                    channel_t bandMin = bandMinMap[SVPosition].bandMin[p*pieceLength+k];
                    size_t proj_idx = jz*Nvc + position + k*NChannels + bandMin + r;

                    if((pieceWiseMin + bandMin + r) >= NChannels || (position + k*NChannels + bandMin + r) >= Nvc ) {
                        fprintf(stderr,"SVproject() out of bounds: p %d r %d k %d\n",p,r,k);
                        fprintf(stderr,"SVproject() out of bounds: total_1 %d total_2 %d\n",pieceWiseMin+bandMin+r,position+k*NChannels+bandMin+r);
                        exit(-1);
                    }
                    else
                    {
                        if(backproject_flag)
                            image[image_idx] += A_padd_Tr_ptr[r*pieceLength+k]*rescale * proj[proj_idx];
                        else
                            proj[proj_idx] += A_padd_Tr_ptr[r*pieceLength+k]*rescale * xval;
                    }
                }
                A_padd_Tr_ptr += myCount*pieceLength;
            }
        }
    }
}

}

/ Forward projection wrapper that first reads or computes SV matrix /

void forwardProject( float proj, float image, struct ImageParams3D imgparams, struct SinoParams3DParallel sinoparams, char *Amatrix_fname, char backproject_flag, char verboseLevel) { int i,j; struct AValues_char *A_Padded_Map; float Aval_max_ptr; struct SVParams svpar;

/* print summary to stdout */
if(verboseLevel>1)
{
    fprintf(stdout,"forwardProject() -- build time: %s, %s\n", __DATE__, __TIME__);
    printSinoParams3DParallel(&sinoparams);
    printImageParams3D(&imgparams);
}

/* Initialize/allocate SV parameters */
initSVParams(&svpar, imgparams, sinoparams);
int Nsv = svpar.Nsv;
int SVLength = svpar.SVLength;

/* Read/compute/write System Matrix */
A_Padded_Map = (struct AValues_char **)multialloc(sizeof(struct AValues_char),2,Nsv,(2*SVLength+1)*(2*SVLength+1));
Aval_max_ptr = (float *) mget_spc(imgparams.Nx*imgparams.Ny,sizeof(float));
if(Amatrix_fname != NULL)
{
    if(verboseLevel)
        fprintf(stdout,"Reading system matrix...\n");
    readAmatrix(Amatrix_fname, A_Padded_Map, Aval_max_ptr, &imgparams, &sinoparams, svpar);
}
else
{
    if(verboseLevel)
        fprintf(stdout,"Computing system matrix...\n");
    char * ImageReconMask = GenImageReconMask(&imgparams);
    A_comp(A_Padded_Map,Aval_max_ptr,svpar,&sinoparams,ImageReconMask,&imgparams);
    free((void *)ImageReconMask);
}

/* Project */
if(verboseLevel)
{
    if(backproject_flag)
        fprintf(stdout,"Back-projecting sinogram...\n");
    else
        fprintf(stdout,"Projecting image...\n");
}
SVproject(proj,image,A_Padded_Map,Aval_max_ptr,imgparams,sinoparams,svpar,backproject_flag);

/* Free SV memory */
for(i=0;i<Nsv;i++) {
    free((void *)svpar.bandMinMap[i].bandMin);
    free((void *)svpar.bandMaxMap[i].bandMax);
}
free((void *)svpar.bandMinMap);
free((void *)svpar.bandMaxMap);

/* Free system matrix */
for(i=0;i<Nsv;i++)
for(j=0;j<(2*SVLength+1)*(2*SVLength+1);j++)
if(A_Padded_Map[i][j].length>0)
{
    free((void *)A_Padded_Map[i][j].val);
    free((void *)A_Padded_Map[i][j].pieceWiseMin);
    free((void *)A_Padded_Map[i][j].pieceWiseWidth);
}
multifree(A_Padded_Map,2);
free((void *)Aval_max_ptr);

} / END forwardProject() /

sjkisner commented 3 years ago

@dyang37 Again something's weird in your observation. A char data type can't be a NaN.

I ran with your additional testing block in recon3d.c and the demo_3D_shepp script with the weights as a zeros array. I get NaNs in the output, which is expected, but I don't get NaNs in the Thetas or the A_padd_Transpose_pointer[]. Is there anything else you changed in the C code other than this block?

sjkisner commented 3 years ago

I made a fix to the C-code that's at least part of @dyang37 's observation of the QGGMRF surrogate coefficient blowing up. The QGGMRF surrogate equation can commonly have zero/zero, so a special case is checked to give the correct expression in that condition. Before the fix it was handled poorly with an if((float val) == 0.0)

I put the fix in the existing nanfix branch. The submodule commit is updated there, so after checking out that branch, run git submodule update to pull the new C-code locally.

sjkisner commented 3 years ago

Added another fix in the nanfix branch for corner case where auto_sigma_y() could return zero. This was causing nan weights to be passed to the C recon function. @cabouman Let's review the change to auto_sigma_y(). There might be a better way of handling it.

cabouman commented 3 years ago

Jordan gets the bug-killer award! He has found 3 bugs. All my tests are running without any NaNs.

Everyone: Please try breaking the code in the nanfix branch.

sjkisner commented 3 years ago

The original prox mode nan's are still not fixed unfortunately. I can still get nan's out of Soumendu's test case. At some point it can start diverging until it overflows. For this to happen it seems

  1. positivity needs to be turned off
  2. roi_radius needs to be set large
  3. multiple threads
sjkisner commented 3 years ago

By the way, if anybody sees a case where it diverges for a single threaded case please let me know.

tbalke commented 3 years ago

Great!!! Another one (or three) bites the dust.

The original prox mode nan's are still not fixed unfortunately Yes, I just checked and I still get them too.

By the way, if anybody sees a case where it diverges for a single threaded case please let me know. This just really screams race condition and it makes sense that it did not get fixed with your patch.

Thanks for taking care of this. Looks like a real nasty 🐛.

Thilo

On Mon, Nov 8, 2021 at 6:32 PM Sherman J Kisner @.***> wrote:

By the way, if anybody sees a case where it diverges for a single threaded case please let me know.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cabouman/svmbir/issues/212#issuecomment-963731322, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGEJN3I4BCNE7F5AZCLQMX3ULB22HANCNFSM5HOWI5OQ .

DamonLee5 commented 3 years ago

The original prox mode nan's are still not fixed unfortunately. I can still get nan's out of Soumendu's test case. At some point it can start diverging until it overflows. For this to happen it seems

  1. positivity needs to be turned off
  2. roi_radius needs to be set large
  3. multiple threads

With this constraint, I finally can reproduce this nans' issue on my end.

DamonLee5 commented 3 years ago

I found a possible reason for Soumendu's nans' issue. I printed totalValue and totalChange in the code. Then, I noticed that they just kept going up and finally became nans.

They are in this parallel block https://github.com/HPImaging/sv-mbirct/blob/1c8ec74a46a7afa66faa78c8fdef5fb35c85f654/src/recon3d.c#L307

I also noticed that these two values are initialized every time in a single thread, which means that they are not supposed to keep going up. ` NumUpdates=0;

totalValue=0;

totalChange=0; `

Therefore, I change all single threads in this parallel block to critical threads, after that nans disappear.

However, this is not a fix. Changing single threads to critical threads may introduce other errors. The main idea of this change is just to show that there may be a bug in the initialization of totalValue and totalChange in the single thread when the number of threads is more than 8.

tbalke commented 3 years ago

Wenrui, can you push a branch so we can try that out?

Thilo

On Mon, Nov 8, 2021 at 7:57 PM Wenrui Li @.***> wrote:

Actually, I found a possible reason for Soumendu's nans' issue. I printed totalValue and totalChange in the code. Then, I noticed that they just kept going up and finally became nans.

They are in this parallel block

https://github.com/HPImaging/sv-mbirct/blob/1c8ec74a46a7afa66faa78c8fdef5fb35c85f654/src/recon3d.c#L307

I also noticed that these two values are initialized every time in a single thread, which means that they should not keep going up. ` NumUpdates=0;

totalValue=0;

totalChange=0; `

Therefore, I change all single threads in this parallel block to critical threads, after that nans disappear.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cabouman/svmbir/issues/212#issuecomment-963767953, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGEJN3MILG5VATEBQ665NITULCE3DANCNFSM5HOWI5OQ .

DamonLee5 commented 3 years ago

Wenrui, can you push a branch so we can try that out? Thilo On Mon, Nov 8, 2021 at 7:57 PM Wenrui Li @.***> wrote: Actually, I found a possible reason for Soumendu's nans' issue. I printed totalValue and totalChange in the code. Then, I noticed that they just kept going up and finally became nans. They are in this parallel block https://github.com/HPImaging/sv-mbirct/blob/1c8ec74a46a7afa66faa78c8fdef5fb35c85f654/src/recon3d.c#L307 I also noticed that these two values are initialized every time in a single thread, which means that they should not keep going up. NumUpdates=0; totalValue=0; totalChange=0; Therefore, I change all single threads in this parallel block to critical threads, after that nans disappear. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#212 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGEJN3MILG5VATEBQ665NITULCE3DANCNFSM5HOWI5OQ .

I don't know how to push to the c code. There is just 3 line modification. Two #pragma omp single -> #pragma omp critical and uncomment a printf.

diff --git a/src/recon3d.c b/src/recon3d.c
index afe7c97..e2a05e2 100644
--- a/src/recon3d.c
+++ b/src/recon3d.c
@@ -308,7 +308,7 @@ void MBIRReconstruct(
     {
         while(stop_FLAG==0 && equits<MaxIterations && iter<100*MaxIterations)
index afe7c97..e2a05e2 100644
--- a/src/recon3d.c
+++ b/src/recon3d.c
@@ -308,7 +308,7 @@ void MBIRReconstruct(
     {
         while(stop_FLAG==0 && equits<MaxIterations && iter<100*MaxIterations)
         {
-            #pragma omp single
+            #pragma omp critical
             {
                 if(iter==0)
                 {
@@ -354,14 +354,14 @@ void MBIRReconstruct(
                             voxelsBuffer1,voxelsBuffer2,&group_id_list[0][0],group);
             }

-            #pragma omp single
+            #pragma omp critical
             {
                 avg_update=avg_update_rel=0.0;
                 if(NumUpdates>0) {
                     avg_update = totalChange/NumUpdates;
                     float avg_value = totalValue/NumUpdates;
                     avg_update_rel = avg_update/avg_value * 100;
-                    //printf("avg_update %f, avg_value %f, avg_update_rel %f\n",avg_update,avg_value,avg_update_rel);
+                    printf("avg_update %f, avg_value %f, avg_update_rel %f\n",avg_update,avg_value,avg_update_rel);
                 }
                 #ifdef COMP_COST
                 float cost = MAPCostFunction3D(image,sinoerr,weight,imgparams,sinoparams,reconparams,param_ext);
tbalke commented 3 years ago

Kind of difficult to decipher the diff.

We don't have permission to write to sv-mbir. Jordan, do you want to give me or Wenrui permission to push branches?

Also, in the mean time, Wenrui, can you send out the modified file by email?

Thilo

On Mon, Nov 8, 2021 at 8:08 PM Wenrui Li @.***> wrote:

Wenrui, can you push a branch so we can try that out? Thilo … <#m-6206245650028605481> On Mon, Nov 8, 2021 at 7:57 PM Wenrui Li @.***> wrote: Actually, I found a possible reason for Soumendu's nans' issue. I printed totalValue and totalChange in the code. Then, I noticed that they just kept going up and finally became nans. They are in this parallel block https://github.com/HPImaging/sv-mbirct/blob/1c8ec74a46a7afa66faa78c8fdef5fb35c85f654/src/recon3d.c#L307 I also noticed that these two values are initialized every time in a single thread, which means that they should not keep going up. NumUpdates=0; totalValue=0; totalChange=0; Therefore, I change all single threads in this parallel block to critical threads, after that nans disappear. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#212 (comment) https://github.com/cabouman/svmbir/issues/212#issuecomment-963767953>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGEJN3MILG5VATEBQ665NITULCE3DANCNFSM5HOWI5OQ .

I don't know how to push to the c code. There is just 3 line modification.

diff --git a/src/recon3d.c b/src/recon3d.c

index afe7c97..e2a05e2 100644

--- a/src/recon3d.c

+++ b/src/recon3d.c

@@ -308,7 +308,7 @@ void MBIRReconstruct(

 {

     while(stop_FLAG==0 && equits<MaxIterations && iter<100*MaxIterations)

index afe7c97..e2a05e2 100644

--- a/src/recon3d.c

+++ b/src/recon3d.c

@@ -308,7 +308,7 @@ void MBIRReconstruct(

 {

     while(stop_FLAG==0 && equits<MaxIterations && iter<100*MaxIterations)

     {
  • pragma omp single

  • pragma omp critical

         {
    
             if(iter==0)
    
             {

@@ -354,14 +354,14 @@ void MBIRReconstruct(

                         voxelsBuffer1,voxelsBuffer2,&group_id_list[0][0],group);

         }
  • pragma omp single

  • pragma omp critical

         {
    
             avg_update=avg_update_rel=0.0;
    
             if(NumUpdates>0) {
    
                 avg_update = totalChange/NumUpdates;
    
                 float avg_value = totalValue/NumUpdates;
    
                 avg_update_rel = avg_update/avg_value * 100;
  • //printf("avg_update %f, avg_value %f, avg_update_rel %f\n",avg_update,avg_value,avg_update_rel);

  • printf("avg_update %f, avg_value %f, avg_update_rel %f\n",avg_update,avg_value,avg_update_rel);

             }
    
             #ifdef COMP_COST
    
             float cost = MAPCostFunction3D(image,sinoerr,weight,imgparams,sinoparams,reconparams,param_ext);

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cabouman/svmbir/issues/212#issuecomment-963772713, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGEJN3KVDFEKW2XI6LXZM4DULCGCFANCNFSM5HOWI5OQ .

gbuzzard commented 3 years ago

If this is correct, then this will be worth another bug-fix prize!

Greg


From: Wenrui Li @.> Sent: Monday, November 8, 2021 9:57 PM To: cabouman/svmbir @.> Cc: Buzzard, Gregery T @.>; Assign @.> Subject: Re: [cabouman/svmbir] Nans generated by recon (Issue #212)

Actually, I found a possible reason for Soumendu's nans' issue. I printed totalValue and totalChange in the code. Then, I noticed that they just kept going up and finally became nans.

They are in this parallel block https://github.com/HPImaging/sv-mbirct/blob/1c8ec74a46a7afa66faa78c8fdef5fb35c85f654/src/recon3d.c#L307

I also noticed that these two values are initialized every time in a single thread, which means that they should not keep going up. ` NumUpdates=0;

totalValue=0;

totalChange=0; `

Therefore, I change all single threads in this parallel block to critical threads, after that nans disappear.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/cabouman/svmbir/issues/212#issuecomment-963767953, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AM4YSVA734ZKDLHWAY6YDH3ULCE3DANCNFSM5HOWI5OQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

gbuzzard commented 3 years ago

[cid:359aaf00-bd8e-4336-8754-383b3b6e77bd] He chose ... poorly.


Before the fix it was handled poorly with an if((float val) == 0.0)

sjkisner commented 3 years ago

It's not the fix we want though. It's basically just removing the parallelization. What we don't know is why the solution can start diverging, under only certain conditions, when it's working on multiple SVs at a time.

sjkisner commented 3 years ago

..also the solution diverging is why the "totalValue" and "totalChange" increase.

tbalke commented 3 years ago

Wenrui, just tried your code, however I get this: (crashes with seg-fault)

ipython -i minimum_fail_thilo.py 
Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:23:19) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.29.0 -- An enhanced Interactive Python. Type '?' for help.
NAN v0 :  0
NAN v :  0
NAN y :  0
NAN weights :  0
Reconstructing axial size (rows,cols)=(256,256).
Found system matrix: /Users/thilobalke/.cache/svmbir/sysmatrix/949995d599e1aa9700d6.2Dsvmatrix
Reading system matrix...
Projecting image...
Reconstructing...
avg_update 6.952815, avg_value 3.182556, avg_update_rel 218.466400
    iteration 1, average change 218.4664 %
avg_update 3.619208, avg_value 5.596809, avg_update_rel 64.665565
avg_update 3.621838, avg_value 6.076057, avg_update_rel 59.608356
avg_update 2.889775, avg_value 4.935551, avg_update_rel 58.550190
    iteration 2, average change 58.5502 %
avg_update 2.071183, avg_value 4.510779, avg_update_rel 45.916306
avg_update 1.818080, avg_value 4.223907, avg_update_rel 43.042618
avg_update 1.597894, avg_value 4.097488, avg_update_rel 38.996910
avg_update 2.318483, avg_value 4.597587, avg_update_rel 50.428265
    iteration 3, average change 50.4283 %
avg_update 1.915801, avg_value 3.969465, avg_update_rel 48.263447
avg_update 2.389617, avg_value 4.456242, avg_update_rel 53.624050
avg_update 1.313544, avg_value 3.248059, avg_update_rel 40.440907
avg_update 1.599080, avg_value 3.596593, avg_update_rel 44.460976
avg_update 1.357422, avg_value 2.812212, avg_update_rel 48.268829
    iteration 4, average change 48.2688 %
avg_update 1.387779, avg_value 3.052405, avg_update_rel 45.465107
avg_update 1.333724, avg_value 3.200379, avg_update_rel 41.673939
avg_update 1.107692, avg_value 2.928550, avg_update_rel 37.823914
avg_update 1.138388, avg_value 2.967253, avg_update_rel 38.365063
    iteration 5, average change 38.3651 %
avg_update 1.078777, avg_value 3.151988, avg_update_rel 34.225273
avg_update 0.858733, avg_value 2.446620, avg_update_rel 35.098763
avg_update 0.928493, avg_value 2.885725, avg_update_rel 32.175396
avg_update 0.738898, avg_value 2.472178, avg_update_rel 29.888546
    iteration 6, average change 29.8885 %
avg_update 0.756748, avg_value 2.250475, avg_update_rel 33.626129
avg_update 0.747336, avg_value 2.396343, avg_update_rel 31.186508
avg_update 0.553686, avg_value 2.145542, avg_update_rel 25.806347
avg_update 0.468487, avg_value 1.906464, avg_update_rel 24.573587
avg_update 0.500960, avg_value 2.186624, avg_update_rel 22.910219
    iteration 7, average change 22.9102 %
avg_update 0.443356, avg_value 2.032089, avg_update_rel 21.817741
avg_update 0.365651, avg_value 1.812467, avg_update_rel 20.174221
avg_update 0.382065, avg_value 2.009985, avg_update_rel 19.008364
avg_update 0.373617, avg_value 1.870456, avg_update_rel 19.974630
    iteration 8, average change 19.9746 %
avg_update 0.253549, avg_value 1.487231, avg_update_rel 17.048386
avg_update 0.392249, avg_value 1.899168, avg_update_rel 20.653730
avg_update 0.527416, avg_value 1.814082, avg_update_rel 29.073446
avg_update 0.534373, avg_value 2.043554, avg_update_rel 26.149195
    iteration 9, average change 26.1492 %
avg_update 0.584181, avg_value 2.035920, avg_update_rel 28.693686
avg_update 0.411650, avg_value 1.755069, avg_update_rel 23.454908
avg_update 0.436988, avg_value 1.867748, avg_update_rel 23.396502
avg_update 0.479135, avg_value 1.772794, avg_update_rel 27.027115
avg_update 0.452267, avg_value 1.892210, avg_update_rel 23.901514
    iteration 10, average change 23.9015 %
avg_update 0.656451, avg_value 2.229850, avg_update_rel 29.439249
avg_update 0.421664, avg_value 1.388003, avg_update_rel 30.379173
avg_update 0.358585, avg_value 1.539606, avg_update_rel 23.290714
avg_update 0.413521, avg_value 1.805688, avg_update_rel 22.901014
    iteration 11, average change 22.9010 %
avg_update 0.268919, avg_value 1.161825, avg_update_rel 23.146305
avg_update 0.274884, avg_value 1.458040, avg_update_rel 18.852983
avg_update 0.300964, avg_value 1.629617, avg_update_rel 18.468403
avg_update 0.209728, avg_value 1.122447, avg_update_rel 18.684889
    iteration 12, average change 18.6849 %
avg_update 0.260653, avg_value 1.546206, avg_update_rel 16.857552
avg_update 0.254298, avg_value 1.452504, avg_update_rel 17.507544
avg_update 0.189676, avg_value 1.078184, avg_update_rel 17.592138
avg_update 0.200906, avg_value 1.256761, avg_update_rel 15.986011
avg_update 0.208253, avg_value 1.342443, avg_update_rel 15.512976
    iteration 13, average change 15.5130 %
avg_update 0.173696, avg_value 0.977794, avg_update_rel 17.764048
avg_update 0.161370, avg_value 1.164673, avg_update_rel 13.855392
avg_update 0.181745, avg_value 1.280187, avg_update_rel 14.196782
avg_update 0.178629, avg_value 1.038818, avg_update_rel 17.195446
    iteration 14, average change 17.1954 %
avg_update 0.152623, avg_value 1.122812, avg_update_rel 13.592895
avg_update 0.152172, avg_value 1.131911, avg_update_rel 13.443795
avg_update 0.355985, avg_value 1.349275, avg_update_rel 26.383457
avg_update 0.378548, avg_value 1.424077, avg_update_rel 26.581993
    iteration 15, average change 26.5820 %
avg_update 0.305564, avg_value 1.160357, avg_update_rel 26.333582
avg_update 0.326786, avg_value 1.276880, avg_update_rel 25.592569
avg_update 0.252568, avg_value 1.086243, avg_update_rel 23.251507
avg_update 0.193913, avg_value 0.934953, avg_update_rel 20.740406
avg_update 0.237044, avg_value 1.244424, avg_update_rel 19.048500
    iteration 16, average change 19.0485 %
avg_update 0.193223, avg_value 1.051725, avg_update_rel 18.372028
avg_update 0.162405, avg_value 0.890415, avg_update_rel 18.239260
avg_update 0.186149, avg_value 1.101740, avg_update_rel 16.895933
avg_update 0.146727, avg_value 0.913011, avg_update_rel 16.070690
    iteration 17, average change 16.0707 %
avg_update 0.123216, avg_value 0.793869, avg_update_rel 15.520918
avg_update 0.145927, avg_value 0.984760, avg_update_rel 14.818515
avg_update 0.139178, avg_value 0.916723, avg_update_rel 15.182084
avg_update 0.184508, avg_value 1.037289, avg_update_rel 17.787544
    iteration 18, average change 17.7875 %
avg_update 0.127176, avg_value 0.894405, avg_update_rel 14.219101
avg_update 0.207916, avg_value 1.120287, avg_update_rel 18.559166
avg_update 0.302538, avg_value 1.122507, avg_update_rel 26.952003
avg_update 0.214566, avg_value 0.951663, avg_update_rel 22.546377
avg_update 0.189372, avg_value 0.914739, avg_update_rel 20.702305
    iteration 19, average change 20.7023 %
avg_update 0.209153, avg_value 0.936906, avg_update_rel 22.323809
avg_update 0.195953, avg_value 0.790079, avg_update_rel 24.801710
avg_update 0.142615, avg_value 0.868678, avg_update_rel 16.417490
avg_update 0.147680, avg_value 0.842458, avg_update_rel 17.529701
avg_update 0.119097, avg_value 0.691803, avg_update_rel 17.215446
    iteration 20, average change 17.2154 %
avg_update 0.106089, avg_value 0.762965, avg_update_rel 13.904783
avg_update 0.104350, avg_value 0.734925, avg_update_rel 14.198666
avg_update 0.091533, avg_value 0.631664, avg_update_rel 14.490784
avg_update 0.183374, avg_value 0.901415, avg_update_rel 20.342878
    iteration 21, average change 20.3429 %
avg_update 0.151360, avg_value 0.721582, avg_update_rel 20.976139
avg_update 0.196627, avg_value 0.896628, avg_update_rel 21.929575
avg_update 0.120269, avg_value 0.722567, avg_update_rel 16.644632
avg_update 0.111216, avg_value 0.678206, avg_update_rel 16.398508
avg_update 0.143519, avg_value 0.814869, avg_update_rel 17.612566
    iteration 22, average change 17.6126 %
avg_update 0.092972, avg_value 0.657475, avg_update_rel 14.140706
avg_update 0.086635, avg_value 0.580660, avg_update_rel 14.920163
avg_update 0.118025, avg_value 0.769056, avg_update_rel 15.346782
avg_update 0.077041, avg_value 0.623433, avg_update_rel 12.357556
    iteration 23, average change 12.3576 %
avg_update 0.067068, avg_value 0.546642, avg_update_rel 12.269043
avg_update 0.097042, avg_value 0.732805, avg_update_rel 13.242599
avg_update 0.095582, avg_value 0.729782, avg_update_rel 13.097350
avg_update 0.076585, avg_value 0.585131, avg_update_rel 13.088444
    iteration 24, average change 13.0884 %
avg_update 0.087880, avg_value 0.676329, avg_update_rel 12.993701
avg_update 0.073787, avg_value 0.637144, avg_update_rel 11.580843
avg_update 0.068991, avg_value 0.580807, avg_update_rel 11.878479
avg_update 0.076197, avg_value 0.635201, avg_update_rel 11.995756
avg_update 0.069437, avg_value 0.627242, avg_update_rel 11.070151
    iteration 25, average change 11.0702 %
avg_update 0.053491, avg_value 0.481822, avg_update_rel 11.101778
avg_update 0.063093, avg_value 0.605987, avg_update_rel 10.411590
avg_update 0.165822, avg_value 0.890669, avg_update_rel 18.617731
avg_update 0.155393, avg_value 0.784139, avg_update_rel 19.816973
    iteration 26, average change 19.8170 %
avg_update 0.128676, avg_value 0.636286, avg_update_rel 20.222965
avg_update 0.139905, avg_value 0.800808, avg_update_rel 17.470432
avg_update 0.126423, avg_value 0.679425, avg_update_rel 18.607418
avg_update 0.119767, avg_value 0.672784, avg_update_rel 17.801718
avg_update 0.120945, avg_value 0.651530, avg_update_rel 18.563263
    iteration 27, average change 18.5633 %
avg_update 0.133739, avg_value 0.625636, avg_update_rel 21.376451
avg_update 0.098955, avg_value 0.642356, avg_update_rel 15.404938
avg_update 0.094186, avg_value 0.586673, avg_update_rel 16.054306
avg_update 0.089358, avg_value 0.542274, avg_update_rel 16.478424
avg_update 0.081995, avg_value 0.592055, avg_update_rel 13.849202
    iteration 28, average change 13.8492 %
avg_update 0.078788, avg_value 0.578357, avg_update_rel 13.622763
avg_update 0.069617, avg_value 0.496433, avg_update_rel 14.023468
avg_update 0.066810, avg_value 0.561185, avg_update_rel 11.905221
avg_update 0.078203, avg_value 0.626456, avg_update_rel 12.483377
    iteration 29, average change 12.4834 %
avg_update 0.066886, avg_value 0.492262, avg_update_rel 13.587436
avg_update 0.060291, avg_value 0.547771, avg_update_rel 11.006598
avg_update 0.063040, avg_value 0.543855, avg_update_rel 11.591328
avg_update 0.059353, avg_value 0.493433, avg_update_rel 12.028558
avg_update 0.113149, avg_value 0.531161, avg_update_rel 21.302212
    iteration 30, average change 21.3022 %
avg_update 0.093592, avg_value 0.549076, avg_update_rel 17.045412
avg_update 0.090871, avg_value 0.504837, avg_update_rel 18.000143
avg_update 0.077087, avg_value 0.474923, avg_update_rel 16.231438
avg_update 0.083077, avg_value 0.522565, avg_update_rel 15.897992
    iteration 31, average change 15.8980 %
avg_update 0.100580, avg_value 0.636973, avg_update_rel 15.790239
avg_update 0.089210, avg_value 0.507186, avg_update_rel 17.589209
avg_update 0.063694, avg_value 0.395739, avg_update_rel 16.095024
Segmentation fault: 11
tbalke commented 3 years ago

On the upside, it crashes before there are any NANs ;)

DamonLee5 commented 3 years ago

It's not the fix we want though. It's basically just removing the parallelization. What we don't know is why the solution can start diverging, under only certain conditions, when it's working on multiple SVs at a time.

I think the solution did not remove the parallelization for super_voxel_recon.

Wenrui, just tried your code, however I get this: (crashes with seg-fault)

ipython -i minimum_fail_thilo.py 
Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:23:19) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.29.0 -- An enhanced Interactive Python. Type '?' for help.
NAN v0 :  0
NAN v :  0
NAN y :  0
NAN weights :  0
Reconstructing axial size (rows,cols)=(256,256).
Found system matrix: /Users/thilobalke/.cache/svmbir/sysmatrix/949995d599e1aa9700d6.2Dsvmatrix
Reading system matrix...
Projecting image...
Reconstructing...
avg_update 6.952815, avg_value 3.182556, avg_update_rel 218.466400
  iteration 1, average change 218.4664 %
avg_update 3.619208, avg_value 5.596809, avg_update_rel 64.665565
avg_update 3.621838, avg_value 6.076057, avg_update_rel 59.608356
avg_update 2.889775, avg_value 4.935551, avg_update_rel 58.550190
  iteration 2, average change 58.5502 %
avg_update 2.071183, avg_value 4.510779, avg_update_rel 45.916306
avg_update 1.818080, avg_value 4.223907, avg_update_rel 43.042618
avg_update 1.597894, avg_value 4.097488, avg_update_rel 38.996910
avg_update 2.318483, avg_value 4.597587, avg_update_rel 50.428265
  iteration 3, average change 50.4283 %
avg_update 1.915801, avg_value 3.969465, avg_update_rel 48.263447
avg_update 2.389617, avg_value 4.456242, avg_update_rel 53.624050
avg_update 1.313544, avg_value 3.248059, avg_update_rel 40.440907
avg_update 1.599080, avg_value 3.596593, avg_update_rel 44.460976
avg_update 1.357422, avg_value 2.812212, avg_update_rel 48.268829
  iteration 4, average change 48.2688 %
avg_update 1.387779, avg_value 3.052405, avg_update_rel 45.465107
avg_update 1.333724, avg_value 3.200379, avg_update_rel 41.673939
avg_update 1.107692, avg_value 2.928550, avg_update_rel 37.823914
avg_update 1.138388, avg_value 2.967253, avg_update_rel 38.365063
  iteration 5, average change 38.3651 %
avg_update 1.078777, avg_value 3.151988, avg_update_rel 34.225273
avg_update 0.858733, avg_value 2.446620, avg_update_rel 35.098763
avg_update 0.928493, avg_value 2.885725, avg_update_rel 32.175396
avg_update 0.738898, avg_value 2.472178, avg_update_rel 29.888546
  iteration 6, average change 29.8885 %
avg_update 0.756748, avg_value 2.250475, avg_update_rel 33.626129
avg_update 0.747336, avg_value 2.396343, avg_update_rel 31.186508
avg_update 0.553686, avg_value 2.145542, avg_update_rel 25.806347
avg_update 0.468487, avg_value 1.906464, avg_update_rel 24.573587
avg_update 0.500960, avg_value 2.186624, avg_update_rel 22.910219
  iteration 7, average change 22.9102 %
avg_update 0.443356, avg_value 2.032089, avg_update_rel 21.817741
avg_update 0.365651, avg_value 1.812467, avg_update_rel 20.174221
avg_update 0.382065, avg_value 2.009985, avg_update_rel 19.008364
avg_update 0.373617, avg_value 1.870456, avg_update_rel 19.974630
  iteration 8, average change 19.9746 %
avg_update 0.253549, avg_value 1.487231, avg_update_rel 17.048386
avg_update 0.392249, avg_value 1.899168, avg_update_rel 20.653730
avg_update 0.527416, avg_value 1.814082, avg_update_rel 29.073446
avg_update 0.534373, avg_value 2.043554, avg_update_rel 26.149195
  iteration 9, average change 26.1492 %
avg_update 0.584181, avg_value 2.035920, avg_update_rel 28.693686
avg_update 0.411650, avg_value 1.755069, avg_update_rel 23.454908
avg_update 0.436988, avg_value 1.867748, avg_update_rel 23.396502
avg_update 0.479135, avg_value 1.772794, avg_update_rel 27.027115
avg_update 0.452267, avg_value 1.892210, avg_update_rel 23.901514
  iteration 10, average change 23.9015 %
avg_update 0.656451, avg_value 2.229850, avg_update_rel 29.439249
avg_update 0.421664, avg_value 1.388003, avg_update_rel 30.379173
avg_update 0.358585, avg_value 1.539606, avg_update_rel 23.290714
avg_update 0.413521, avg_value 1.805688, avg_update_rel 22.901014
  iteration 11, average change 22.9010 %
avg_update 0.268919, avg_value 1.161825, avg_update_rel 23.146305
avg_update 0.274884, avg_value 1.458040, avg_update_rel 18.852983
avg_update 0.300964, avg_value 1.629617, avg_update_rel 18.468403
avg_update 0.209728, avg_value 1.122447, avg_update_rel 18.684889
  iteration 12, average change 18.6849 %
avg_update 0.260653, avg_value 1.546206, avg_update_rel 16.857552
avg_update 0.254298, avg_value 1.452504, avg_update_rel 17.507544
avg_update 0.189676, avg_value 1.078184, avg_update_rel 17.592138
avg_update 0.200906, avg_value 1.256761, avg_update_rel 15.986011
avg_update 0.208253, avg_value 1.342443, avg_update_rel 15.512976
  iteration 13, average change 15.5130 %
avg_update 0.173696, avg_value 0.977794, avg_update_rel 17.764048
avg_update 0.161370, avg_value 1.164673, avg_update_rel 13.855392
avg_update 0.181745, avg_value 1.280187, avg_update_rel 14.196782
avg_update 0.178629, avg_value 1.038818, avg_update_rel 17.195446
  iteration 14, average change 17.1954 %
avg_update 0.152623, avg_value 1.122812, avg_update_rel 13.592895
avg_update 0.152172, avg_value 1.131911, avg_update_rel 13.443795
avg_update 0.355985, avg_value 1.349275, avg_update_rel 26.383457
avg_update 0.378548, avg_value 1.424077, avg_update_rel 26.581993
  iteration 15, average change 26.5820 %
avg_update 0.305564, avg_value 1.160357, avg_update_rel 26.333582
avg_update 0.326786, avg_value 1.276880, avg_update_rel 25.592569
avg_update 0.252568, avg_value 1.086243, avg_update_rel 23.251507
avg_update 0.193913, avg_value 0.934953, avg_update_rel 20.740406
avg_update 0.237044, avg_value 1.244424, avg_update_rel 19.048500
  iteration 16, average change 19.0485 %
avg_update 0.193223, avg_value 1.051725, avg_update_rel 18.372028
avg_update 0.162405, avg_value 0.890415, avg_update_rel 18.239260
avg_update 0.186149, avg_value 1.101740, avg_update_rel 16.895933
avg_update 0.146727, avg_value 0.913011, avg_update_rel 16.070690
  iteration 17, average change 16.0707 %
avg_update 0.123216, avg_value 0.793869, avg_update_rel 15.520918
avg_update 0.145927, avg_value 0.984760, avg_update_rel 14.818515
avg_update 0.139178, avg_value 0.916723, avg_update_rel 15.182084
avg_update 0.184508, avg_value 1.037289, avg_update_rel 17.787544
  iteration 18, average change 17.7875 %
avg_update 0.127176, avg_value 0.894405, avg_update_rel 14.219101
avg_update 0.207916, avg_value 1.120287, avg_update_rel 18.559166
avg_update 0.302538, avg_value 1.122507, avg_update_rel 26.952003
avg_update 0.214566, avg_value 0.951663, avg_update_rel 22.546377
avg_update 0.189372, avg_value 0.914739, avg_update_rel 20.702305
  iteration 19, average change 20.7023 %
avg_update 0.209153, avg_value 0.936906, avg_update_rel 22.323809
avg_update 0.195953, avg_value 0.790079, avg_update_rel 24.801710
avg_update 0.142615, avg_value 0.868678, avg_update_rel 16.417490
avg_update 0.147680, avg_value 0.842458, avg_update_rel 17.529701
avg_update 0.119097, avg_value 0.691803, avg_update_rel 17.215446
  iteration 20, average change 17.2154 %
avg_update 0.106089, avg_value 0.762965, avg_update_rel 13.904783
avg_update 0.104350, avg_value 0.734925, avg_update_rel 14.198666
avg_update 0.091533, avg_value 0.631664, avg_update_rel 14.490784
avg_update 0.183374, avg_value 0.901415, avg_update_rel 20.342878
  iteration 21, average change 20.3429 %
avg_update 0.151360, avg_value 0.721582, avg_update_rel 20.976139
avg_update 0.196627, avg_value 0.896628, avg_update_rel 21.929575
avg_update 0.120269, avg_value 0.722567, avg_update_rel 16.644632
avg_update 0.111216, avg_value 0.678206, avg_update_rel 16.398508
avg_update 0.143519, avg_value 0.814869, avg_update_rel 17.612566
  iteration 22, average change 17.6126 %
avg_update 0.092972, avg_value 0.657475, avg_update_rel 14.140706
avg_update 0.086635, avg_value 0.580660, avg_update_rel 14.920163
avg_update 0.118025, avg_value 0.769056, avg_update_rel 15.346782
avg_update 0.077041, avg_value 0.623433, avg_update_rel 12.357556
  iteration 23, average change 12.3576 %
avg_update 0.067068, avg_value 0.546642, avg_update_rel 12.269043
avg_update 0.097042, avg_value 0.732805, avg_update_rel 13.242599
avg_update 0.095582, avg_value 0.729782, avg_update_rel 13.097350
avg_update 0.076585, avg_value 0.585131, avg_update_rel 13.088444
  iteration 24, average change 13.0884 %
avg_update 0.087880, avg_value 0.676329, avg_update_rel 12.993701
avg_update 0.073787, avg_value 0.637144, avg_update_rel 11.580843
avg_update 0.068991, avg_value 0.580807, avg_update_rel 11.878479
avg_update 0.076197, avg_value 0.635201, avg_update_rel 11.995756
avg_update 0.069437, avg_value 0.627242, avg_update_rel 11.070151
  iteration 25, average change 11.0702 %
avg_update 0.053491, avg_value 0.481822, avg_update_rel 11.101778
avg_update 0.063093, avg_value 0.605987, avg_update_rel 10.411590
avg_update 0.165822, avg_value 0.890669, avg_update_rel 18.617731
avg_update 0.155393, avg_value 0.784139, avg_update_rel 19.816973
  iteration 26, average change 19.8170 %
avg_update 0.128676, avg_value 0.636286, avg_update_rel 20.222965
avg_update 0.139905, avg_value 0.800808, avg_update_rel 17.470432
avg_update 0.126423, avg_value 0.679425, avg_update_rel 18.607418
avg_update 0.119767, avg_value 0.672784, avg_update_rel 17.801718
avg_update 0.120945, avg_value 0.651530, avg_update_rel 18.563263
  iteration 27, average change 18.5633 %
avg_update 0.133739, avg_value 0.625636, avg_update_rel 21.376451
avg_update 0.098955, avg_value 0.642356, avg_update_rel 15.404938
avg_update 0.094186, avg_value 0.586673, avg_update_rel 16.054306
avg_update 0.089358, avg_value 0.542274, avg_update_rel 16.478424
avg_update 0.081995, avg_value 0.592055, avg_update_rel 13.849202
  iteration 28, average change 13.8492 %
avg_update 0.078788, avg_value 0.578357, avg_update_rel 13.622763
avg_update 0.069617, avg_value 0.496433, avg_update_rel 14.023468
avg_update 0.066810, avg_value 0.561185, avg_update_rel 11.905221
avg_update 0.078203, avg_value 0.626456, avg_update_rel 12.483377
  iteration 29, average change 12.4834 %
avg_update 0.066886, avg_value 0.492262, avg_update_rel 13.587436
avg_update 0.060291, avg_value 0.547771, avg_update_rel 11.006598
avg_update 0.063040, avg_value 0.543855, avg_update_rel 11.591328
avg_update 0.059353, avg_value 0.493433, avg_update_rel 12.028558
avg_update 0.113149, avg_value 0.531161, avg_update_rel 21.302212
  iteration 30, average change 21.3022 %
avg_update 0.093592, avg_value 0.549076, avg_update_rel 17.045412
avg_update 0.090871, avg_value 0.504837, avg_update_rel 18.000143
avg_update 0.077087, avg_value 0.474923, avg_update_rel 16.231438
avg_update 0.083077, avg_value 0.522565, avg_update_rel 15.897992
  iteration 31, average change 15.8980 %
avg_update 0.100580, avg_value 0.636973, avg_update_rel 15.790239
avg_update 0.089210, avg_value 0.507186, avg_update_rel 17.589209
avg_update 0.063694, avg_value 0.395739, avg_update_rel 16.095024
Segmentation fault: 11

I got this too when I used all threads, but when I used 16 threads out of 24 threads, it works fine.

sjkisner commented 3 years ago

Hold up, I don't think you want to change those #pragma omp single to #pragma omp critical

The "single" is so that block only executes once. With "critical" it will be executed by each thread. You don't want to do that here because that block is setting the SV update order.

DamonLee5 commented 3 years ago

Hold up, I don't think you want to change those #pragma omp single to #pragma omp critical

The "single" is so that block only executes once. With "critical" it will be executed by each thread. I don't think you want to do that here because that block is setting the SV update order.

Right, you are correct. What I actually want to do is to make it omp master.

sjkisner commented 3 years ago

You don't want "omp master" because that doesn't have a barrier, whereas "omp single" does. The other threads would carry on before the heap gets set up.

dyang37 commented 3 years ago

For the zero weights case I'm still getting NaNs when using zero weights. The code I'm testing is the current tip of nanfix branch. Here's the output:

Reading system matrix...
Projecting image...
Reconstructing axial size (rows,cols)=(256,256).
Found system matrix: /home/yang1467/.cache/svmbir/sysmatrix/bc0b0940eb4abd0cc86e.2Dsvmatrix
Reading system matrix...
Projecting image...
Reconstructing...
    iteration 1, average change -nan %
    Reached stopping condition
    Reconstruction time = 158 ms (iterations only)
press Enter

What's different this time is that the change shows -nan instead of nan, and recon stops after 1 iteration.

cabouman commented 3 years ago

Jordan figured out the fundamental problem!

It's not an actual bug, it's an algorithmic instability that occurs when too many super-voxels are updated simultaneously. Each thread forms its own super-voxel, so that happens when then number of slices is small, the number of threads is large, the image is small, and the positivity constraint is off.

Our short-term fix will be to write a routine that limits the number of threads depending on parameter choices.

sjkisner commented 3 years ago

Let me re-emphasize the point that we'll only need to limit the threads when it's a small problem size.

This test case from Soumendu has highlighted some deeply interesting issues with the algorithm.

sjkisner commented 3 years ago

@dyang37 I'm 99 percent sure your zero-weights case that generated this:

Reconstructing...
    iteration 1, average change -nan %
    Reached stopping condition
    Reconstruction time = 158 ms (iterations only)

after the recent bug fixes did not produce nans in the output, but rather zeros. It's just an incorrect reported "average change" because that value was normalized by the image average which was zero. This reporting is fixed in the C-code now.

sjkisner commented 3 years ago

I made a fix to the C-code and updated the nanfix branch here. To everybody's who's been testing this, please give it a try. After pulling nanfix remember to also run git submodule update to pull the new C code. I've run a few thousand trials without the test case diverging.

A little too much detail to go into here, but here's a summary. A technique to speed up convergence assigns a priority to the updates so that more attention is given to regions of the image that appear to need it most. This priority queue wasn't very effectively randomized before scheduling the threads. This particular test case had a couple aspects that made all the highest priority updates structurally aligned in the image, which potentially leads to instability if there's a lot of parallelization with a single slice. (Edit: Note this instability gets smacked down if the positivity constraint is enabled)

The current fix is a quick one but it addresses the problem in 3 layers.

  1. It enlarges the size of the priority queue for the non-homogeneous updates.
  2. It changes the thread scheduling which effectively randomizes these updates. (With just these 2 fixes, running the test case with 20 threads I couldn't produce divergence in 1000 trials).
  3. I limit the number of threads for reconstruction based on the image and SV dimensions, which only has a limiting effect when the problem size is small.

I temporarily disabled the new max_threads() function in this branch so we can test the C code fixes.

smajee commented 3 years ago

I don't get NANs anymore with this fix. Thanks!

DamonLee5 commented 3 years ago

Thanks, Jordan. I tested it 10 times and I don't get NANs on Soumendu's test case with 24 threads.

sjkisner commented 3 years ago

The fixes described above are available from PyPI in svmbir v0.2.6.

cabouman commented 3 years ago

I'm closing this issue for now since I think we have a fix which removes the instability. In the future, we might want to implement some improved fix that allows for more parallel threads, but maybe we can open another issue when we do that.

cabouman commented 2 years ago

Bugzilla has reared its ugly head again. Now it seems to be happening with the 3D Shepp Logan and

sjkisner commented 1 year ago

Fixed as of v0.3.2 #275

sjkisner commented 11 months ago

This was fixed some time ago.