Special field updates for r = 0 and m = 0, ±1 in cylindrical coordinates

oskooi commented 1 year ago

Fixes #2489.

While #2383 fixed a bug for the $z$-PML at $r = 0$ for $m = \pm 1$, it seems to have introduced a separate bug for the $m = 0$ case. This is because #2383 actually never tested the $m = 0$ case which is why it went undetected until #2489.

This PR reverts the change to update_eh.cpp in #2383 but only for $m = 0$. It is not yet clear to me why this fixes the issue reported in #2489 but the test results suggests it does.

A new unit test for $m = 0$ (currently missing) based on the results in #2489 will also be added to this PR in a subsequent commit.

Note: two test cases in test_adjoint_cyl.py involving $m = 0$ required adjusting the tolerance of their error check.

oskooi commented 1 year ago

Added a new function step_update_EDHB0 in step_generic.cpp based on the suggestion from @stevengj. This seems to fix the problem with one caveat.

As shown below, this fix produces the correct results whereby the DFT fields for $E_r$ and $E_z$ match the near-to-far fields for three test cases of $m = 0, 1, 2$. Results shown for $E_r$ only.

src_er_mon_er_dft_vs_n2f_m0_dair1 0_res100

src_er_mon_er_dft_vs_n2f_m1_dair1 0_res100

src_er_mon_er_dft_vs_n2f_m2_dair1 0_res100

The caveat is there is still one case which does show a discrepancy: $H_r$ at $r = 0$ for $|m| = 1$. This case is not necessarily a bug because the $H_r$ from $B_r$ update is not affected by step_update_EDHB0 since $H_r = B_r$ (i.e., no PML for $H_r$ at $r = 0$). Fixing this would require storing $H_r$ as a separate array but the additional storage over the entire cell is probably not worth it just to fix the output at $r = 0$.

src_er_mon_hr_dft_vs_n2f_m1_dair1 0_res100

stevengj commented 1 year ago

The caveat is there is still one case which does show a discrepancy: Hr at r=0 for m=1.

I suspect that this is an unrelated problem with interpolation.

stevengj commented 1 year ago

I assume it still fixes #2182?

oskooi commented 1 year ago

I assume it still fixes https://github.com/NanoComp/meep/issues/2182?

test_pml_cyl.py added in #2383 to verify that the $z$-PML is working correctly is failing for $|m| = 1$. The fields at $r = 0$ are oscillating in time long after the source as turned off as they were prior to #2383. The change introduced in #2383 to make the $z$-PML work correctly was to remove the step-update at $r = 0$ in update_eh.cpp.

The $z$-PML at $r = 0$ is working correctly for $m = 0$ but that is not part of the existing tests in test_pml_cyl.py. This test is to be added in this PR.

oskooi commented 1 year ago

Perhaps there is separate bug in the step-curl update for $B_r$ (= $H_r$) at $r = 0$ for $|m| = 1$ in fields_chunk::step_db of step_db.cpp:

https://github.com/NanoComp/meep/blob/e4504ed58eec75eea57aebc342d5b63683c89edf/src/step_db.cpp#L313-L347

stevengj commented 1 year ago

Would be good to see Ep (before and after this PR) for m=1. Without this PR, it seems like Ep at r=0 would never be updated, which seems like it would screw up the Hr update at r=0, so I'm confused about why removing the Ep update would help. (Unless Ep is getting updated somewhere else?)

stevengj commented 1 year ago

@mochen4 should also check to make sure that he didn't change what portion of the array is "owned" for cylindrical coordinates when he was updating loop_in_chunks (#1895), since that would affect the step_db code. (Also check whether #2182 was present before #1895.)

oskooi commented 1 year ago

with 2538

src_er_mon_ep_dft_vs_n2f_m1_dair1 0_res50

before 2538

src_er_mon_ep_dft_vs_n2f_m1_dair1 0_res50

src_er_mon_hr_dft_vs_n2f_m1_dair1 0_res50

stevengj commented 1 year ago

This is confusing to me because it doesn't seem like Ep should be updated at all at r=0 without this patch. (The previous update_EDHB line skips r=0 for Ep because of the little_owned_corner0.)

Maybe put in some printf calls to see where the Ep is being changed for r=0?

stevengj commented 1 year ago

Another quick check is to try setting μ = 1 + 1e-14 (i.e. force it to store and update H separately), in case the missing H update at r=0 (which affects the r=0 PML) is actually the real source of #2182.

oskooi commented 1 year ago

This is confusing to me because it doesn't seem like Ep should be updated at all at r=0 without this patch. (The previous update_EDHB line skips r=0 for Ep because of the little_owned_corner0.)

Maybe put in some printf calls to see where the Ep is being changed for r=0?

Looks like the reason the DFT and N2F fields for $E_\phi$ and $H_r$ do not exactly match at $r = 0$ for $|m| = 1$ (as shown in the figures above) is not because of the changes introduced in this PR but rather because get_dft_array is interpolating the fields to the center of the voxel. This means that ep and hr which are defined on the Yee grid at $r = 0$ are being interpolated to $r = \Delta / 2$. The key point is that we should not be using get_dft_array at $r = 0$ as part of the validation check.

I inspected the time-domain (and not the DFT) fields at $r = 0$ by printing out the raw values (see code snippet below). The results are consistent with the expected values at $r = 0$ for $|m| = 1$ for which $E_\phi$ and $H_r$ can be nonzero.

Summary of Results

$\varepsilon = 1$ everywhere
- pre 2538: ep is zero at $r = 0$ whereas hr is nonzero at $r = 0$
- 2538: ep and hr are nonzero at $r = 0$
$\varepsilon = 1$, $\mu = 1.01$ everywhere via default_material=mp.Medium(mu=1.01)
- pre 2538: ep and hr are zero at $r = 0$
- 2538: ep and hr are nonzero at $r = 0$

Notes

for $\varepsilon = 1$ and pre 2538:
- dp at $r = 0$ is nonzero due to step-curl at $r = 0$ while ep is zero (never changed from its initial value)
- hr is nonzero at $r = 0$ is expected because hr = br and br is nonzero due to step-curl at $r = 0$

diff --git a/src/update_eh.cpp b/src/update_eh.cpp
index 31e1c9e6..9b6d0c88 100644
--- a/src/update_eh.cpp
+++ b/src/update_eh.cpp
@@ -207,6 +207,20 @@ bool fields_chunk::update_eh(field_type ft, bool skip_w_components) {
             }
           }
         }
+
+        if (ec == Ep || ec == Hr) {
+          realnum *E = f[ec][cmp];
+          const int yee_idx = gv.yee_index(ec);
+          const int sR = gv.stride(R), nZ = gv.num_direction(Z);
+          bool zero_everywhere = true;
+          for (int iZ = 0; iZ < nZ; iZ++) {
+            const int i = yee_idx + iZ - sR;
+            if (E[i] != 0)
+              zero_everywhere = false;
+          }
+          master_printf("zero-at-r0?, %s, %d, %d\n", component_name(ec), cmp, zero_everywhere);
+        }
+
       }
     }
   }

It seems therefore that this PR is producing the expected results (in the non-PML region anyway). However, we still need to explain why the $z$-PML at $r = 0$ is not working correctly due to the failing test in test_pml_cyl.py.

mochen4 commented 1 year ago

@mochen4 should also check to make sure that he didn't change what portion of the array is "owned" for cylindrical coordinates when he was updating loop_in_chunks (#1895), since that would affect the step_db code. (Also check whether #2182 was present before #1895.)

I checked and found the ownership isn't changed by #1895, and #2182 was present before #1895.

oskooi commented 1 year ago

A couple of changes:

Added a new test case for m = 0 which fails without this PR.
Modified the m = -1 test case to use a narrowband source similar to the fix used in #2182 (comment) in order to get the unit test to pass.

This PR does fix a bug for the m = 0 but uncovers the bug for m = -1 originally reported in #2182.

There are two options for how to proceed: (1) we merge this PR as-is and then work on a bug fix for |m| = 1 or (2) include the additional bug fix in this PR.

stevengj commented 1 year ago

As discussed today, we went over the r=0 updates at

https://github.com/NanoComp/meep/blob/955dcaa61faa50cec2909a0c090855312b14ecb6/src/step_db.cpp#L313-L346

and so far they look correct, but we didn't check the PML terms yet. Would be good to compare those to the ones (for r≠0) in step_generic.cpp, since the PML equations should be the same (they don't involve spatial derivatives).

oskooi commented 1 year ago

I also verified that we are not adding the i*m/r term twice at $r = 0$ during the update for d(Br)/dt = d(Ep)/dz - i*m*Ez/r. There are two separate blocks where this happens:

https://github.com/NanoComp/meep/blob/eb8ff3aff135f28e5171e2417817b6bbd9c98e7f/src/step_db.cpp#L156-L157

https://github.com/NanoComp/meep/blob/eb8ff3aff135f28e5171e2417817b6bbd9c98e7f/src/step_db.cpp#L281-L282

In the first instance, the loop over the $r$ axis is set up such that it never starts at $r = 0$ for the case in which the field chunk includes $r = 0$:

https://github.com/NanoComp/meep/blob/eb8ff3aff135f28e5171e2417817b6bbd9c98e7f/src/step_db.cpp#L174

https://github.com/NanoComp/meep/blob/eb8ff3aff135f28e5171e2417817b6bbd9c98e7f/src/step_db.cpp#L182

oskooi commented 1 year ago

we didn't check the PML terms yet. Would be good to compare those to the ones (for r≠0) in step_generic.cpp, since the PML equations should be the same (they don't involve spatial derivatives).

The special step-curl update equation with $z$-PML for Dp and |m| = 1 at $r = 0$ in step_db.cpp does not seem to match the step-curl update equations for $r \neq 0$ in step_generic.cpp.

Specifically, in step_db.cpp we have cc = Dp, d_c = P, f_p = f[Hr][cmp], f_m = f[Hz][cmp], dsig = Z, siginv = s->siginv[Z], dsigu = R, siginvu = 0, fu = 0, the_f = f[Dp][cmp], s_d = +1, f_m_mult = 2, and fcnd = 0. Using these parameters, the update equations are:

https://github.com/NanoComp/meep/blob/eb8ff3aff135f28e5171e2417817b6bbd9c98e7f/src/step_db.cpp#L334-L341

The key line number is 339 involving updating the_f at some $z$ position along $r = 0$.

For comparison, in step_generic.cpp we have:

https://github.com/NanoComp/meep/blob/eb8ff3aff135f28e5171e2417817b6bbd9c98e7f/src/step_generic.cpp#L197-L204

Only the second term matches step_db.cpp (although the sign is flipped). The first term in this equation does not appear in step_db.cpp.

(The step-curl equation for Br contains dsig = P which means it does not involve $z$-PML. Br is therefore omitted from this analysis.)

oskooi commented 1 year ago

I have tried to add the correct step-db update equations for $z$-PML and the absorber at $r = 0$ for $|m| = 1$ to step_db.cpp by simply copying and over the update equations from step_generic.cpp (with some modifications).

However, there seems to be a bug somewhere in this implementation since the fields blow up for the unit test for m = -1 in test_pml_cyl.py.

oskooi commented 1 year ago

step_db.cpp: In member function 'bool meep::fields_chunk::step_db(meep::field_type)':
step_db.cpp:349:29: error: operands to '?:' have different types 'const meep::realnum*' {aka 'const float*'} and 'meep::realnum' {aka 'float'}
  349 |           the_f[iz] = ((kap ? : kap[k] - sig[k] : 1) * the_f[iz] + dfcnd) * (siginv ? siginv[k] : 1);
      |                         ~~~~^~~~~~~~~~~~~~~~~~~
step_db.cpp:349:48: error: expected ')' before ':' token
  349 |           the_f[iz] = ((kap ? : kap[k] - sig[k] : 1) * the_f[iz] + dfcnd) * (siginv ? siginv[k] : 1);
      |                        ~                       ^~
      |                                                )
step_db.cpp:349:101: error: expected ')' before ';' token
  349 |           the_f[iz] = ((kap ? : kap[k] - sig[k] : 1) * the_f[iz] + dfcnd) * (siginv ? siginv[k] : 1);
      |                       ~                                                                             ^
      |                                                                                                     )
step_db.cpp:350:81: error: 'i' was not declared in this scope; did you mean 'pi'?
  350 |           if (fu) fu[iz] = siginvu[ku] * ((kapu ? kapu[ku] - sigu[ku] : 1) * fu[i] + the_f[iz] - fprev);
      |                                                                                 ^
      |                                                                                 pi
step_db.cpp:325:24: warning: unused variable 'siginv' [-Wunused-variable]
  325 |         const realnum *siginv = s->sigsize[dsig] > 1 ? s->siginv[dsig] : 0;

oskooi commented 1 year ago

There is a bug in the special step-db update equations at $r = 0$ for $|m| = 1$ introduced by the latest commit because the unit test (TestLDOS.test_ldos_ext_eff of test_ldos.py) which verifies that the extraction efficiency of an LED is identical in 3d and cylindrical coordinates is failing:

extraction efficiency (cyl):, 3.440227
extraction efficiency (3D):, 0.327121

Clearly, the results for the cylindrical coordinates is wrong (the extraction efficiency cannot be larger than one).

I went over the update equations again and compared them to the "most general case" of step_generic.cpp from which they were adapted. The update equations seem correct, however, and it is not obvious where the bug could be.

https://github.com/NanoComp/meep/blob/95ddacdda9754b0c6613b2a25a1154e87ab427e9/src/step_generic.cpp#L218-L230

oskooi commented 1 year ago

There was a bug in the sign of the curl terms and strides which involved a simple fix:

@@ -346,7 +346,7 @@ bool fields_chunk::step_db(field_type ft) {
         const int dku = gv.iyee_shift(cc).in_direction(dsigu);
         realnum *fu = siginvu && f_u[cc][cmp] ? f[cc][cmp] : 0;
         realnum *the_f = fu ? f_u[cc][cmp] : f[cc][cmp];
-        int sd = ft == D_stuff ? +1 : -1;
+        int sd = ft == D_stuff ? -1 : +1;
         realnum f_m_mult = ft == D_stuff ? 2 : (1 - 2 * cmp) * m;
         realnum dt2 = dt * 0.5;

stevengj commented 1 year ago

I think the old sd was correct, as we just verified by looking at the update equations and Yee grid.

If a test is failing now and was passing with the old code in step_db, what I would suggest is simply changing the r=0 update equations back into the old ones term-by-term. At some point the test will pass, and that will let us zoom in on the new term that is causing the problem.

oskooi commented 1 year ago

It seems there was a bug in the Dp update whereby the loop over the $z$ coordinates involving the index iz was one larger than it needed to be:

--- a/src/step_db.cpp
+++ b/src/step_db.cpp
@@ -351,7 +350,7 @@ bool fields_chunk::step_db(field_type ft) {
         realnum f_m_mult = ft == D_stuff ? 2 : (1 - 2 * cmp) * m;
         realnum dt2 = dt * 0.5;

-        for (int iz = (ft == D_stuff); iz < nz + (ft == D_stuff); ++iz) {
+        for (int iz = (ft == D_stuff); iz < nz; ++iz) {
           realnum fprev = the_f[iz];
           realnum dfcnd = (sd * Courant) * (f_p[iz] - f_p[iz - sd] - f_m_mult * f_m[iz]);
           if (fcnd) {

oskooi commented 1 year ago

With the latest commit, this PR finally fixes #2489.

The test for $m = 1$ involves an $E_r$ point source at $r = 0$ within a dielectric disc and two different sets of near-field monitors. The computed far fields are the same regardless of the near-field monitor configuration.

led_disc_cyl_discR1_L10 0_dpmlr1 0_dpmlz1 0_plot2D

led_disc_cyl_radpattern_rp0_res50 0_L10 0_dpmlr1 0_dpmlz1 0

oskooi commented 1 year ago

As a demonstration that this PR produces the correct result for $\textbf{m = 0}$, here are the results from a test which compares the radiated fields in air computed using two different methods of (1) DFT monitor and (2) near-to-far field transformation, from an $E_r$ point source at $r = 0.1$ within a dielectric disc above a perfect-metal ground plane. (1) and (2) should produce identical results.

Results are shown below for the master branch and this PR.

disc_n2f_dair1 0_plot2D

1. master branch (5a1dd2c)

led_disc_er_dft_vs_n2f_rpos0 1_m0_dair1 0_res100_master

PR 2538

led_disc_er_dft_vs_n2f_rpos0 1_m0_dair1 0_res100_PR2538

stevengj commented 1 year ago

What concerns me here is that it seems like the loop has to be for (iz = 1; iz < nz+1; ++iz) to get all of the owned points for the Dp component. So, we are intentionally introducing a bug by not updating the iz = nz owned point (albeit maybe a second-order error), based on the empirical observation that it somehow works around some other problem (we hope … we can't be sure because we don't understand the underlying problem). This is an unreliable approach …

You say that changing the loop to nz+1 messes up the LED extraction simulation. Could we see some snapshots of the time domain fields with and without the +1 to see why the update rule at a single grid point is causing such a big difference? e.g. is something blowing up in the PML?

Also might be good to check the step_boundaries code — add a printf to see what grid points and components are being set by boundary conditions, to make sure that this "owned" point is not accidentally being overwritten by a boundary condition.

oskooi commented 1 year ago

Could we see some snapshots of the time domain fields with and without the +1 to see why the update rule at a single grid point is causing such a big difference? e.g. is something blowing up in the PML?

I plotted the DFT fields for Dp vs. $z$ at $r = 0$ without (blue plot) and with (red plot) the +1 term applied to the ending index of the Dp field array. There is a large difference in the DFT fields. The fields without the +1 term seem correct. The fields with the +1 seem clearly wrong.

In this setup, the $E_r$ point source is at $z = -1.0$ and the $z$-PML starts at $z = 0.15$.

One obvious difference in these results is that the $z$-PML in the +1 case does not seem to be working properly: the fields only start to decay well into the PML at $z > 0.15$.

Dp_dft_vs_z_r0_PR2538

Dp_dft_vs_z_r0_PR2538_plus

mochen4 commented 1 year ago

I plotted the DFT fields for Dp vs. z at r=0 without (blue plot) and with (red plot) the +1 term applied to the ending index of the Dp field array. There is a large difference in the DFT fields. The fields without the +1 term seem correct. The fields with the +1 seem clearly wrong.

Both plots seem incorrect to me. I agree that fields should start decaying at 0.15, but they should also decay slowly as in the second plot, not stop abruptly as in the first plot. Moreover, the bottom boundary is metallic right? Shouldn't the fields go to zero there?

I have been looking at the indices that are looped over in step_db, and I noticed some inconsistencies (if I understand correctly): There are nz+1 (from iz=0 to iz=nz) field points at ir=0. And indeed, earlier in the step_db code, the script handling PML (and other stuff) loops from iz=0 to iz=nz to update the_f for all cases, e.g.

https://github.com/NanoComp/meep/blob/133570ccbd5ba622a407e31d74b512a7c8e459f3/src/step_db.cpp#L184

On the other hand, when handling the Dz, Dp, Hr fields at r=0 for m=-1,0,1, (if I keep the +1 for Dp,) the loop only goes through nz points (either iz=0 to nz-1 or iz=1 to nz). And I am confused how these start and end points were chosen. As Ardavan noted, only nz points are owned at ir=0, so I guess we should indeed only update nz points? And I think it should be the first nz points iz=0 to nz-1?

stevengj commented 1 year ago

Regarding @mochen4's comments:

Remember that we aren't outputting the Yee points, but by default the DFT is interpolated to the center of the voxels. So at z=0 we don't see the Yee point where Ep is exactly zero, but an interpolation to a nearby point, which is nonzero. It looks like it is going to zero, so it looks okay.
In the case where it looks like it is suddenly going to zero in the PML, this is where I think @oskooi's loop is going to nz-1 instead of all the way to nz. So effectively it is not updating one of the "owned" points, which effectively puts a pixel of PEC at the boundary between the PML and non-PML region. This would make it suddenly zero. But I think it would be informative to have a full 2d image (either of a DFT field or a time-domain movie), to see why what should be just a single pixel seems to matter so much.
The loop that goes unconditionally from iz=0 to iz=nz is in the code adding the i*m/r terms. Indeed, this is updating both owned and not-owned points. But updating the not-owned points here should be harmless because they will just get overwritten by step_boundaries. Doing it this way makes the logic simpler, but it might be a nice exercise to compute the correct owned bounds using LOOP_OVER_VOL_OWNED, or by calculating them from the little_owned_corner and big_owned_corner.
Note that for m=0 and the Dz component, it is owned from iz=0 to iz=nz-1. Interestingly, in principle we could also timestep the curl term for the iz=nz point for Dz, since it turns out to not use any data from iz=nz+1. However, we instead get this point from boundary conditions, which is fine. In general, Meep should be very consistent here — every component has the same number (nx,ny,nz) of owned points in ever direction, but for some components the first (0th) point is not-owned and for some the last (n-th) is owned in a given direction, depending on what touches the boundary.

oskooi commented 1 year ago

Regarding @stevengj's first comment, this is indeed what I observed in #2383 (comment).

mochen4 commented 1 year ago

I added print statement after these lines to check the non-owned points. https://github.com/NanoComp/meep/blob/133570ccbd5ba622a407e31d74b512a7c8e459f3/src/boundaries.cpp#L461-L462

In my setup, at r=0: the two grid_volume owned points from -112 to 14 and 16 to 116 for Dp, and -113 to 13 and 15 to 115 for Br; the non-owned points are (0, -114) and (0, 14) for Dp, and (0,15) and (0,117) for Br. It looks correct that the first of Dp and last of Br are not owned.

oskooi commented 1 year ago

Following @stevengj's third suggestion, I tried to rewrite the $+im/r$ update loop in step_db.cpp using LOOP_OVER_VOL_OWNED which replaces the manual loops over the $r$ and $z$ grid points. The setup required using the macros DEF_k and DEF_ku to determine the array indices for the PMLs in the $r$ and $z$ grid points.

This is what it looks like for the most general case:

diff --git a/src/step_db.cpp b/src/step_db.cpp
index 794d7ab6..99a91ae6 100644
--- a/src/step_db.cpp
+++ b/src/step_db.cpp
@@ -179,97 +179,95 @@ bool fields_chunk::step_db(field_type ft) {
           if (siginvu) { // PML + fu
             if (cndinv)  // PML + fu + conductivity
               //////////////////// MOST GENERAL CASE //////////////////////
-              for (int ir = ir0 == 0; ir <= gv.nr(); ++ir) {
+              LOOP_OVER_VOL_OWNED(gv, cc, i) {
+                const ptrdiff_t ir = loop_i1;
                 realnum rinv = the_m / (ir + ir0);
-                for (int iz = 0; iz <= gv.nz(); ++iz) {
-                  ptrdiff_t idx = ir * sr + iz;
-                  int k = dk + 2 * (dsig == Z ? iz : ir);
-                  int ku = dku + 2 * (dsigu == Z ? iz : ir);
-                  realnum df, dfcnd = rinv * g[idx] * cndinv[idx];
-                  fcnd[idx] += dfcnd;
-                  fu[idx] += (df = dfcnd * siginv[k]);
-                  the_f[idx] += siginvu[ku] * df;
-                }
+                DEF_k;
+                DEF_ku;
+                realnum df, dfcnd = rinv * g[i] * cndinv[i];
+                fcnd[i] += dfcnd;
+                fu[i] += (df = dfcnd * siginv[k]);
+                the_f[i] += siginvu[ku] * df;
               }

Unfortunately, these changes are causing the fields to blow up for a cylindrical-coordinate simulation involving PMLs.

oskooi commented 1 year ago

Regarding @stevengj 's second comment, I plotted the DFT fields for $|D_\phi|$ over the entire $(r,z)$ cell excluding the high-index layer. There does not seem to be a noticeable difference in the fields for the two Dp custom loops with termination (1) iz < nz and (2) iz < nz + (ft == D_stuff).

Even though the fields are similar in the two cases, the extraction efficiency is very different. Case 1 is correct, case 2 is not.

led_cyl_debug_plot2D

Dp_dft_PR2538

Dp_dft_PR2538_plus

oskooi commented 1 year ago

After rebasing to master following #2585 and replacing the custom loops for $r = 0$ with LOOP_OVER_IVECS for $m = 0, \pm 1$, there is now just a single failing test: TestLDOS.test_ldos_ext_eff in python/tests/test_ldos.py.

The thing that is causing this test to fail is the flux computed in the $z$ direction which is different than the value computed by the master branch. The LDOS (used to compute the total flux) and the flux in the $r$ direction are unchanged. This is an important finding which suggests one thing: the field updates are correct but there is a bug in the $z$-flux calculation at $r = 0$ probably because of the way the fields $(Er, E\phi, Hr, H\phi)$ are all interpolated to the center of the voxel at $r = 0.5\Delta r$ which violates the special field values for |m| = 1. Note that $E_\phi$ and $H_r$ are defined at $r = 0$ in the Yee grid and can have nonzero values.

@mochen4, any thoughts?

mochen4 commented 1 year ago

After rebasing to master following #2585 and replacing the custom loops for r=0 with LOOP_OVER_IVECS for m=0,±1, there is now just a single failing test: TestLDOS.test_ldos_ext_eff in python/tests/test_ldos.py.

If I understand correctly, replacing with LOOP_OVER_IVECS in #2585 actually made a difference? So the original loop incorrectly updated some points that actually were not overridden? If that is the case, maybe something similar might happen somewhere else?

oskooi commented 1 year ago

Replacing the custom loops with LOOP_OVER_IVECS did make a difference. Everything seems to be working fine now except for the $z$-flux calculation at $r = 0$. I confirmed this by repeating the calculation in the extraction efficiency unit test but modifying the $z$-flux line monitor to exclude $r = 0$. The results are identical for this branch and master. Since the $z$-flux value is different for this branch and master only when the line monitor includes $r = 0$, this must mean the problem is with the $z$-flux calculation at $r = 0$.

I suspect there is a bug in the field-interpolation step of dft_chunk::update_dft because of the way the $r = 0$ case is being handled:

https://github.com/NanoComp/meep/blob/ee3bd80a419ceead4f1cd4c1fd5ae7ed82c6de46/src/dft.cpp#L274-L287

The problem is that we need $P_z$ exactly at $r = 0$. However, in these lines the fields are always interpolated to the center of the voxel (i.e., $r = 0.5\Delta r$). It seems we need to add a special case here just for $r = 0$?

mochen4 commented 1 year ago

Replacing the custom loops with LOOP_OVER_IVECS did make a difference. Everything seems to be working fine now except for the z-flux calculation at r=0. I confirmed this by repeating the calculation in the extraction efficiency unit test but modifying the z-flux line monitor to exclude r=0. The results are identical for this branch and master. Since the z-flux value is different for this branch and master only when the line monitor includes r=0, this must mean the problem is with the z-flux calculation at r=0.

I agree. My point was that, since we actually didn't expect replacing LOOP_OVER_IVECS would make a difference based on @stevengj 's comment, there might be something that we missed. Thankfully it is fixed now, but we might want to keep it in mind when we encounter another bug involving custom loops in the future.

The problem is that we need Pz exactly at r=0. However, in these lines the fields are always interpolated to the center of the voxel (i.e., r=Δr). It seems we need to add a special case here just for r=0?

That sounds plausible. If the error comes from interpolation, can we confirm it by checking the error vs. resolution?

mochen4 commented 1 year ago

I checked and found that, if r=0 was included, the flux increased (roughly) quadratically with respect to the resolution, which means a factor of resolution for each field component at r=0?

stevengj commented 1 year ago

It would still be nice to see what the time-domain fields look like (either a movie or just a few typical snapshots)

mochen4 commented 1 year ago

DFT fields of Ep and Hr at r=0 position of the flux monitor indeed grow linearly with respect to the resolution.

Using sim.run(until=10) and sim.run(until=100), and sim.plot2D, I plotted Ep field at time=10: at_time10

and at time=100 at_time100

The fields at r=0 at the start of simulation don't make sense to me, but it looks fine later at time=100, so I guess the time evolution probably is correct, but the initialization is wrong somehow?

mochen4 commented 1 year ago

In comparison, on master branch,

At time = 10: master_at_time10

And at time = 100: master_at_time100

stevengj commented 1 year ago

The t=10 plot shows something weird at r=0. Would be good to check:

Does this increase with resolution? YES ☑
Is it only the fields at r=0, not any other pixel? YES ☑
Does it show up in both Ep and Dp, or just Ep? If it's just Ep, maybe chi1inv is screwed up at r=0? BOTH ☑
Does it show up if you add a printf to output the raw r=0 Yee-grid values, or does it only show up after interpolation?

The only way I can think of for the fields to be screwed up at r=0 and not be screwed up everywhere is for the problem to be in Ez and maybe Ep, since there was a comment (added in https://github.com/NanoComp/meep/commit/07b25dd86b95a3f92b61848422794e45c11576a7, removed in #2383) that these E field components at r=0 don't actually affect the H = curl E timestepping, they only affect field output and DFTs where E is specifically requested. Note that the present PR puts back the E-from-D updates at r=0, albeit in a different way.

oskooi commented 1 year ago

In the failing unit test for the extraction efficiency which involves computing the radiated flux using add_flux (which is based on interpolating the DFT fields to the center of the voxel that we have identified as containing a potential bug at $r = 0$), computing the radiated flux instead from the radiation pattern of the far fields via add_near2far (which does not involve interpolating the DFT fields to the center of the voxel) the test passes: the extraction efficiency computed in cylindrical coordinates matches the 3d Cartesian result.

I think the change from add_flux to add_near2far provides further evidence of the interpolation bug for the DFT fields at $r = 0$.

We can probably go ahead and merge this PR now and then file a bug report that we can address with a separate PR.

mochen4 commented 1 year ago

It looks like that the raw Dp fields at r=0 actually do increase with resolution. At t=5, the fields are

In addition, the fields oscillates twice as fast. For the same period of time=5, I have 500 snapshots of fields at resolution=25 and 1000 snapshots of fields at resolution=50. Therefore, frames 498-500 at resolution 25 should correspond to frames 996-1000 at resolution 50. However, plotting those frames, it looks like that the fields are oscillating at the rate of time stepping:

mochen4 commented 1 year ago

@oskooi pointed out that, when resolution is doubled, pixel size is halved, so time-domain fields getting doubled would make sense. The evolution of fields in time is reasonable and consistent with what we expect from a correct time stepping. The next step is to check that there is no jump in the raw field as r approaches 0.

oskooi commented 1 year ago

when resolution is doubled, pixel size is halved, so time-domain fields getting doubled

For reference, due to the finite discretization a $\delta$-function current source has an amplitude of $1 / \Delta x$ where $\Delta x$ is the voxel dimension. This is implemented in:

https://github.com/NanoComp/meep/blob/dad3e5273e92f45f66991397e05e65c821cee55a/src/sources.cpp#L480-L484

Doubling the grid resolution should therefore result in a doubling of the field amplitudes in the time domain. This is what we are observing here.

mochen4 commented 1 year ago

In addition, the fields oscillates twice as fast. For the same period of time=5, I have 500 snapshots of fields at resolution=25 and 1000 snapshots of fields at resolution=50. Therefore, frames 498-500 at resolution 25 should correspond to frames 996-1000 at resolution 50. However, plotting those frames, it looks like that the fields are oscillating at the rate of time stepping:

I realized that I was saving both real and imaginary parts of the fields together. If I focus on the real part (i.e. frames with odd number index), the fields evolution would make sense. Here is an animation for the real part of Dp at r=0 for t=0 to 5:

https://github.com/NanoComp/meep/assets/25192039/e235a709-f0e5-479f-a7a5-daf5524d268d

mochen4 commented 1 year ago

On the other hand, using get_array with snap=True to get the fields on Yee grid, I did find a discontinuity along r at r=0, consistent with the previous plot in https://github.com/NanoComp/meep/pull/2538#issuecomment-1663226841 Dp_along_r

oskooi commented 1 year ago

Instead of a bug in the $z$-flux calculation at $r = 0$, I think the bug is in the "restriction" procedure used to define an $E_r$ point source at $r = 0$ since $E_r$ is not defined at $r = 0$ on the Yee grid. An $E_r$ point source defined at $r = 0$ should be "restricted" to its nearest Yee-grid locations at $r = 0.5 Δr$ but this does not seem to be happening correctly.

In the failing unit test of the extraction efficiency, if the position of the source is shifted by a voxel away from $r = 0$ the test passes using the flux calculation via add_flux. I modified this test in the latest commit and added this note in the comments:

    # Because (1) Er is not defined at r=0 on the Yee grid, and (2) there                  
    # seems to be a bug in the restriction of an Er point source at r = 0,                 
    # the source is placed at r = ~Δr (just outside the first voxel).                      
    # This incurs a small error which decreases linearly with resolution.                  
    src_pt = mp.Vector3(1.5 / self.resolution, 0, -0.5 * sz + h * dmat)

Note that this hack only seems to work if the source position in $r$ is placed outside of the first voxel. In this example, it is placed in the center of the second voxel (i.e., 1.5 * Δr).

stevengj commented 1 year ago

Might be worth trying an integrated source, since that adds the current in a completely different place. Might help to determine whether this is problem with the Er field update.

stevengj commented 12 months ago

One thing to make sure of is that we are doing the transformation across r=0 correctly when interpolating the source in src_vol_chunkloop. Two things to try:

Around this line add a printf statement to see (1) the amplitude amps_array[idx_vol], (2) the coordinates iloc and loc, and (3) the shift_phase . Do it for a source at r=0 and at r=1.5Δr. In both cases the point source should be mapped to 4 grid points, at ± 1 pixel in Δz and Δr, but in the r=0 case the sources should be at ±Δr/2, which should both get mapped via r_to_minus_r symmetry to +Δr/2 (hopefully with the same sign).
I have a suspicion that we may be subtracting currents from r < 0 instead of adding. You could check that forcing it to always add, by changing this line to remove the shift_phase factor (just to complex<double> amp = data->amp;) whether that helps. (This should have no effect for sources that are not near r = 0, i.e. for r > Δr, since they shouldn't get any contribution from r < 0 points.)

NanoComp / meep

Special field updates for r = 0 and m = 0, ±1 in cylindrical coordinates #2538