dfm / tess-atlas

MIT License
9 stars 8 forks source link

eccentricity post-processing failing due to memory issues #145

Closed avivajpeyi closed 2 years ago

avivajpeyi commented 2 years ago

To reproduce

run_toi 103
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
/tmp/ipykernel_3797/2497259523.py in <module>
      1 if star.density_data_present:
----> 2     ecc_samples = calculate_eccentricity_weights(tic_entry, inference_data)
      3     ecc_samples.to_csv(
      4         os.path.join(tic_entry.outdir, "eccentricity_samples.csv")
      5     )

/fred/oz200/avajpeyi/projects/tess-atlas/src/tess_atlas/analysis/eccenticity_reweighting.py in calculate_eccentricity_weights(tic_entry, inference_data)
     30     # density that implies
     31     g = (1 + ecc * np.sin(omega)) / np.sqrt(1 - ecc ** 2)
---> 32     rho = rho_circ / g[:, None] ** 3
     33
     34     # Re-weight these samples to get weighted posterior samples

MemoryError: Unable to allocate 29.1 TiB for an array with shape (2000000, 2000000) and data type float64
avivajpeyi commented 2 years ago

29.1 TiB is crazy large!! Surely it shouldn't be this big...

dfm commented 2 years ago

That's also the wrong dimensions, right? What are the shapes of rho_circ and g?

avivajpeyi commented 2 years ago

Ah, yeah there is a silly bug:

post = inference_data.posterior
rho_circ = post.rho_circ.values.flatten()
period = post.p.values.flatten()
print(f"rho_circ shape: {rho_circ.shape}")
print(f"period shape: {period.shape}")

rho_circ = np.repeat(rho_circ, 500, axis=0)
period = np.repeat(period, 500, axis=0)
print("After downsampling")
print(f"rho_circ shape: {rho_circ.shape}")
print(f"period shape: {period.shape}")
rho_circ shape: (4000,)
period shape: (4000,)
After downsampling
rho_circ shape: (2000000,)
period shape: (2000000,)

Surprised that this didn't cause issues earlier!