asteca / ASteCA

Code for the ASteCA package.
http://asteca.github.io/
MIT License
18 stars 6 forks source link

Interpolate IMF masses into the isochrones, not the other way around #503

Closed Gabriel-p closed 3 years ago

Gabriel-p commented 3 years ago

Currently state

  1. Read the metallicity files to generate the (z, a) grid
  2. interpolate extra points (1500) taking advantage that the stars are ordered by initial masses
  3. generate binary data
  4. sample the selected IMF using the mass range given ptemcee loop
    1. the (z, a) point is (weighted) averaged from the isochrones grid
    2. the sampled masses in the IMF are used to replicate stars in each track
    3. .... ptemcee loop

Each isochrone has a different mass distribution, since isochrones of the same metallicity but different ages or same age but different metallicity have different M_ini values. This generates several issues:

  1. each (weighted averaged) isochrone needs to have the sampled IMF interpolated each time it is requested, i.e point 4.
  2. when the isochrone weighted averaging happens in point 5. (through zaWAverage()), the isochrones are not mass-aligned.
  3. as the masses are interpolated from the sampled IMF using the closest M_ini values in the isochrone, some masses will be repeated and others at a considerable distance from the sampled value

A different approach to address those issues

  1. Read the metallicity files to generate the (z, a) grid
  2. sample the selected IMF using the mass range given
  3. interpolate the sampled masses
  4. generate binary data ptemcee loop
    1. the (z, a) point is (weighted) averaged from the isochrones grid
    2. ... ptemcee loop
Gabriel-p commented 3 years ago

What I learnt testing the approach mentioned above:

  1. Interpolating the mass distribution prior to the best fit process consumes a lot of memory since each isochrone needs to be interpolated up to the maximum mass defined. This can reach unreasonable values for large masses as the number of stars sampled given a total mass is approximately: M_T/0.4. For a mass of 10000 this gives ~25000, for a GC with mass up to 500000 the number is ~1.25e6. This is the number of stars sampled into each isochrone, which means that with just a few isochrones defined the memory can grow to enormous values. For this reason, I did not apply these changes
  2. The cut_max_mag function can be made simpler and faster using a numpy mask
  3. Dragging the binary probabilities costs performance, the same with updating the binary masses and the IDs separately. Instead I could put the binary probabilities into a separate large array (loaded into memory, never modified, same for all, same one used for the differential reddening), store the secondary masses (not the combined masses) and update their values using bin_indxs so that single systems have a nan value.
  4. The sampled IMF has a large impact on the final mass estimated. For the exact same cluster it can fluctuate around 4000-5000 M for different runs, a 20-25% range of variation. I could sample a the IMF once per metallicity file to alleviate this issue.
  5. There's no need to interpolate the isochrones a priori with a large value (N=1500 currently), I can just interpolate the array in mass_interp using scipy.interpolate.interp1d()