TSFitPy is a pipeline designed to determine stellar abundances and atmospheric parameters through the use of Nelder-Mead (simplex algorithm) minimization. It calculates model spectra "on the fly" while fitting instead of using a more sophisticated method that relies on training neural networks (such as the method used by the full SAPP used for much larger datasets). Using this method allows the pipeline to gain flexibility in the stellar physics used at the expense of computation time, which makes it useful for small datasets of ~100 stars and fewer.
To use TSFitPy, you will need a working Turbospectrum (TS) installation of the latest version, which has the capability to compute NLTE line profiles as well as calculate specified spectral windows instead of a full spectrum for a given range. TSFitPy has not been tested on older versions of TS. The latest version of TS can be found here: https://github.com/bertrandplez/Turbospectrum_NLTE
The code requires at least version Python 3.7. It also makes use of fortran programs, which will need to be compiled on the user's machine (intel fortran ifort compiler highly recommended). The Python packages needed are as follows (they should all be installable via "pip install"):
pip install dask[complete]
)Also, Windows is not supported (?).
There is a WIP (developed by NS only atm) GUI for TSFitPy (at least results plotting). You can see whether you might like it. It is available here.
If you use this code, please acknowledge the authors of the code and the Turbospectrum code. Please reference the following papers:
If you make use of the teff
or vmic
fitting methods, please acknowledge the following paper with the description of the method:
If you make use of the NLTE data, please acknowledge the appropriate papers for the NLTE data used (different one for each element!). See docs in the TurboSpectrum GitHub page for most sources. If you use Y, Eu, Al, Na (qmh), here are the updated references (might not be inside TurboSpectrum documentation just yet):
This is a short version of the installation + running the code just to test that it works (not all inputs are downloaded). Please read the full version when you actually want to use the code.
git clone https://github.com/TSFitPy-developers/TSFitPy.git
cd TSFitPy/turbospectrum/
rm readme.txt
cd ..
git clone https://github.com/bertrandplez/Turbospectrum_NLTE.git turbospectrum
cd turbospectrum/exec/
(or cd turbospectrum/exec-gf/
if using gnu compiler)mcmodel=medium
in the makefile on line 11make
cd ../../
atmospheres/marcs_standard_comp.zip
and put the unzipped files in TSFitPy/input_files/model_atmospheres/1D/
nlte_ges_linelist_jmgXX20XX_I_II
and put it in TSFitPy/input_files/linelists/linelist_for_fitting/
pwd
-> TSFitPy/
cp -r input_files/sample_spectrum/* input_files/observed_spectra/
cp -r turbospectrum/interpolator/* scripts/model_interpolators/
cd scripts/
python3 compile_fortran_codes.py
(choose GNU
, IFORT
or IFX
compiler)cd ..
python3 main.py ./input_files/tsfitpy_input_configuration.cfg
[Fe/H]=-0.3180 rv= 0.3268 vmic= 1.0516 vmac= 2.4664 rotation= 0.0000 chisqr= 0.55214633
Converged: Fe: -0.32 Number of iterations: 14
[Fe/H]= 0.0282 rv= 0.2282 vmic= 1.0679 vmac= 3.7423 rotation= 0.0000 chisqr= 0.33176552
Converged: Fe: 0.03 Number of iterations: 10
Total runtime was XXXX minutes.
TSFitPy had normal termination
Fitting completed
End of the fitting: XXX-XX-20XX-XX-XX-XX
cd plotting_tools/
cp plot_output.ipynb plot_output_test.ipynb
plot_output_test.ipynb
and run the first celloutput_folder_location
to the folder where the output is saved (e.g. you will find a folder TSFitPy/output_files/XXX-XX-20XX-XX-XX-XX_0.XXXXXXXXXXXXXXXX_LTE_Fe_1D/
, so copy XXX-XX-20XX-XX-XX-XX_0.XXXXXXXXXXXXXXXX_LTE_Fe_1D
and paste it instead of OUTPUTCHANGEHERE
)TSFitPy/turbospectrum/
turbospectrum/exec/
(or in turbospectrum/exec-gf/
if using the gnu compiler)
mcmodel=medium
in the makefile (linux only?)TSFitPy/turbospectrum/interpolator/
to TSFitPy/scripts/model_interpolators/
TSFitPy/scripts/compile_fortran_codes.py
using python to compile model interpolatorsDownload all desired linelists and put them into TSFitPy/input_files/linelists/linelist_for_fitting/
Example VALD lines are included in the TSFitPy/input_files/linelists/linelist_vald/
, which you can move to the TSFitPy/input_files/linelists/linelist_for_fitting/
(they are LTE ONLY)
ALTERNATIVELY but required for the NLTE Gaia-ESO linelists are provided here in the file nlte_ges_linelist
(wavelength ranges: 4200-9200 Å)
Additional linelists to include are VALD ones (3700-3800, 3800-4200, 9200-9300, 9300-9800) that extend the wavelength regime of the Gaia-ESO linelist
Example of how your linelist files should look like:
TSFitPy/input_files/linelists/linelist_for_fitting/
|-> nlte_ges_linelist_jmgDATE_I_II
|-> vald-3700-3800-for-grid-nlte-DATE
|-> vald-3800-4200-for-grid-nlte-DATE
|-> vald-9200-9300-for-grid-nlte-DATE
|-> vald-9300-9800-for-grid-nlte-DATE
Molecular linelists may also be important. They are found in the same link as the Gaia-ESO linelist in the folder molecules-420-920nm
, and put alongside other molecules (UNZIP them, they should have extension .bsyn)
IMPORTANT: ALL files in the TSFitPy/input_files/linelists/linelist_for_fitting/
are used, so do NOT use BOTH Gaia-ESO and VALD data from same wavelength ranges (i.e. don't use both downloaded gaia-eso AND files from example /linelist_vald/
)
TSFitPy/input_files/model_atmospheres/
in either 1D
or 3D
folder and unzip them
atmospheres/marcs_standard_comp.zip
3D
folder as well (same link, atmospheres/average_stagger_grid_forTSv20.zip
)p2500_g+3.0_m0.0_t00_st_z+0.00_a+0.00_c+0.00_n+0.00_o+0.00_r+0.00_s+0.00.mod
) are in respectively:TSFitPy/input_files/model_atmospheres/1D/
TSFitPy/input_files/model_atmospheres/3D/
dep-grids
folder.
_marcs_names
! for aux files) for average STAGGER modelsTSFitPy/input_files/nlte_data/Ba/
)TSFitPy/input_files/nlte_data/model_atoms/
TSFitPy/input_files/linemask_files/
and separated into individual foldersELEMENT-lmask.txt
;
(either on any line or after the line's values)TSFitPy/input_files/sample_spectrum/
and put it into desired folder, such as TSFitPy/input_files/observed spectra/
TSFitPy/input_files/tsfitpy_input_configuration.cfg
and call it something like TSFitPy/input_files/tsfitpy_input_configuration_sun_test.cfg
git pull
cannot interfer with the default configurationcompiler
specifies the compiler (ifort, ifx or gnu). Location of turbospectrum is expected at TSFitPy/turbospectrum/
TSFitPy/scripts/TSFitPy.py
, but it is possible to change paths if you want to keep your data in a separate folder (e.g. it can be useful if sharing data on a cluster)atmosphere_type
1D or 3D: MARCS is 1D, STAGGER average are 3D modelsmode
specifies fitting mode
all
fits all lines within the linemask at the same time. Advantage: faster. Disadvantage: cannot know whether any specific line has good or bad fit. Not recommendedlbl
fits all lines within the linemask one line at a time. Advantage: get full info for each line with separate abundance, macroturbulence etc. Can also fit microturbulence (not very well though?) Disadvatage: slowerteff
fits specified line by changing temperature, not abundance. Recommended use: use element H and include NLTE for H and Fevmic
changes vmic for each abundance line. Very slow, but can get a good vmic. Recommended use: use element Feinclude_molecules
is whether you want molecules in your spectra. Fitting can be faster without them (useful when testing?). Recommended: yes, unless molecules are not expected in the spectra.nlte
whether want to have NLTE or not. Elements to include with NLTE are written belowfit_vmic
, fit_vmac
, fit_rotation
Yes/No/Input depending on if you want to fit them or not. If "no", then microturbulence is calculated based on empirical relation (based on teff, logg, [Fe/H]) and works rather well for FGK-type stars. If Input, it is possible to input microturbulence in the fitlist later. If macroturbulence/rotation is "no", then constant one will be applied to all stars (chosen below). If Input, then each star can be given one in the fitlist later on. For vmic recommended "Input" or "No". Use 'vmic' fitting mode to fit vmic instead of using this optionelement_to_fit
which element to fit. Normally one would fit one element at a time, but it is possible to fit several elements at once using the same linemask (e.g. blended line). If you want to fit abundance for different lines, then you need to fit one element at a time
nlte_elements
which elements to include NLTE for (ignored if nlte = False
)linemask_file
is the path in the linemasks_path
from where the linemask is takenwavelength_delta
is the synthetic generated wavelength_delta
. Try not to have it less than observed spectra, but too small will result in slow fitting. Recommended as a start: 0.005
segment_size
is the size of the generated segment around the line. Recommended as a start: 4
. Not very important, but can be useful to change if nearby lines are very strong and affect the fit (note: H is always generated whether it is in the segment or not)debug_mode
can be used for debugging code. 0 is best for normal fits, 1 outputs some extra information during the Python fitting, 2 outputs full TS fortran information (a lot of info and much slower fit)number_of_cpus
is the number of CPUs to use for the fitting. 1 is best for debugging, but can be increased for faster fittingexperimental_parallelisation
parallelises based on each line (not just spectra) for lbl mode. Much faster, but if crashes, then try to set to False (I would recommend to keep it True)cluster_name
is the name of the cluster, used just for printing. Honestly not very importantinput_filename
name of the used fitlistoutput_filename
name of the output file (usually output
and no need to change)resolution
is resolution of teh spectra. 0 is no convolution based on the resolutionvmac
is default macroturbulence for all stars if fit_macroturb = No
rotation
is default macroturbulence for all stars if fit_rotation = No
init_guess_elements
are elements to use for initial guess. Only important if you fit several elements at once (e.g. blended line). Can be several elements: input_elements_abundance = Mg Ti Ca
init_guess_elements_path
is the path to the linelist for the initial guess elements. E.g. it can look like this: each line is name of spectra and abundance for the guess [X/Fe]: HD000001 0.2
. Order of elements should be the same as in init_guess_elements
input_elements_abundance
are elements to use for input abundance. This allows to specify abundance of the star for each element. If not specified, then solar scaled abundances are used. Can be several elements: input_elements_abundance = Mg Ti Ca
input_elements_abundance_path
is the path to the linelist for the input abundance elements. E.g. it can look like this: each line is name of spectra and abundance [X/Fe]: HD000001 0.2
. Order of elements should be the same as in input_elements_abundance
wavelength_minimum
and wavelength_maximum
specifies the ranges of the fitted spectrabounds_vmic
are the bounds for microturbulence (HARD BOUNDS)guess_range_vmic
is the range of microturbulence for the initial guessfind_upper_limit
after the fit is done, it is possible to find upper limit for abundance. This is done by increasing abundance until fitted chi-squared increases by the given upper_limit_sigma
(e.g. 3 sigma). This is done for each line separately. Doubles the time of the fit, but can be useful to find upper limit or error estimationbounds_teff
are the bounds for temperature (HARD BOUNDS)guess_range_teff
is the range of temperature for the initial guess deviated from the input temperaturefitlist
file is added as well:
name_of_spectrum_to_fit rv teff logg [Fe/H] Input_vmicroturb Input_vmacroturb
is first rowname_of_spectrum_to_fit rv teff logg [Fe/H] vmic vmac Mg/Fe Ti/H A(Ca)
HD000001 0.0 5000.0 2.0 0.0 1.0 1.0 0.2 0.1 0.3
;
for commentsMg
would presume that you want A(Mg)
and not [Mg/Fe]
snr
, which will be used to estimate an error using formula sigma = 1/snr
, if no error is provided in the spectra0
to use default error of 0.01
main.py
script. You can do by running:
python3 main.py ./input_files/tsfitpy_input_configuration.cfg
- this will run the fitting on your local computer./output_files/
flag_error
and flag_warning
. It is 8 bit number. 00000000
means no issues were found. Any bit > 1
means some issue was discovered. Not all flags exist, so to be updated later, perhaps.
flag_error
: if any bits are 1
, then it is incredibly certain that you should not believe the outputcompute_blend_spectra
and sensitivity_abundance_offset
), the chi squared should be higher than the chi squared of the fit with abundance. If it is not, then it is likely that the minimum is not found and triggers this flagflag_warning
: if any bits are 1
, then it could be that the fit is bad, but that is not guaranteed. For a big sample, you might want to remove any of these objects as well. For a small one, maybe check them by eye../plotting_tools/plot_output.ipynb
(I would create a copy of it first so that it doesn't interfere with git pull
later on)output_folder_location
to the folder where the output is saved in the second cellplot_one_star
is a function to plot the results for one star, but it can take extra arguments:
save_figure
is a string, it takes the name of the figure to save WITH the file extension, e.g. save_figure = 'HD000001.pdf'
. Actual filename will also add line wavelengthxlim
and ylim
are the limits of the plotfont_size
is the font size of the plotresolution
, macro
and rotation
in plot_synthetic_data
function to convolve the synthetic spectraverbose=True
to see Fortran output (doesn't work on Mac? Linux only?)synthetic_spectra_generation_configuration.cfg
file based on the one provided in ./input_files/
save_unnormalised_spectra
is a boolean, if True, then it will save the unnormalised synthetic spectra (each file would be 30% larger)fitlist
(see example in ./input_files/synthetic_spectra_parameters
)
teff logg [Fe/H]
vmic
, vmac
and rotation
[X/Fe]
, [X/H]
and A(X)
(and their combinations) are allowed[X/Fe]
internally basicallyMg
would presume that you want A(Mg)
and not [Mg/Fe]
python3 generate_synthetic_spectra.py ./input_files/synthetic_spectra_generation_configuration.cfg
index.spec
where index
is the index of the spectrum (same order as in fitlist
)elements_to_fit
), line (linemask
) and model atmosphere (fitlist
) in the config file3947.295 9.146 -2.095 -7.957 7.0 4.57E+06 's' 'p' 0.0 1.0 'O I LS:2s2.2p3.(4S*).3s 5S* LS:2s2.2p3.(4S*).4p 5P' 6 26 '3s5S2*' '4p5P3' 'c' 'a'
6 26 '3s5S2*' '4p5P3' 'c' 'a'
, where 6 and 26 are energy levels in the model atom that correspond to this transition'3s5S2*' '4p5P3'
are the labels of the levels in the model atom'c' 'a'
are not actually useddebug = 2
and search for W A R N I N G B S Y N
)nlte_grids
folder and also add it to the /nlte_data/nlte_filanames.cfg
fileO I
), and check that the levels are correct (at least NLTE should be written there next to the element name)utities/convert_lte_to_nlte.py
script
* UP
or * RADIATIVE
to stop reading the levels0.0000 2.0 ' Level 1 = 2p6.3s2S ' 1
, so space locations are important. 6th or 7th column is the electron configuration, which should not have any spaces:
ifort
command, you might need to source the setvars.sh
file (see this intel link and especially this intel link)
source <install-dir>/setvars.sh
<install_dir>
is probably /opt/intel/oneapi/
? .bashrc
/.profile
file, so that it is sourced automatically (because otherwise you will have to source it every time you open a new terminal)Regarding the multiprocessing usage with Dask
cluster_type
is the type of cluster. Can be slurm
or local
. local
ignores the other parametersnumber_of_nodes
is the number of nodes to use, cpus are taken from the number_of_cpus
parametermemory_per_cpu_gb
is the memory per CPU in GB (recommended to be at least 3 GB)script_commands
are the commands to run before the script. Each command is separated by a semicolon. Example below purges the modules and loads the ones needed
module purge;module load basic-path;module load intel;module load anaconda3-py3.10
time_limit_hours
is the time limit for the job in hours partition
is the partition to use in the cluster passed to the --partition
flagnumber_of_cpus
cpus, so if you have 4 cpus and 2 nodes, it will use 8 CPUs in totalhttp://localhost:8787/
in your browser (port might be different, but probably not)cluster_name
in the [ExtraParameters]ssh -N -L {port}:{host}:{port} {cluster_name}
, where cluster_name
is taken from your confighost
and port
where the dask dashboard is ranport
http://localhost:{port}/
, replacing {port}
with the port aboveThere are some utilities in the utilities
folder. They are not used in the fitting, but can be useful for other things.
print_interesting_loggf.py
read_nlte_grid.py
convert_lte_to_nlte.py
lte_linelist
is the LTE linelistoutput_file
is the NLTE linelistnlte_models
are the NLTE model atoms for each element (list of models)transition_names
are the names of the transitions in the linelist (2D list, where each list is for each element and ionisation stage)ion_energy
is the ionisation energy of each element and ionisation stage (appropriately to transition_names
)Here is the Trello board for the project: https://trello.com/b/2xe7T6qH/tsfitpy-todo
How to debug the code:
debug=2
forr
, because Fortran errors usually start with forrtl
TurboSpectrum has a limit of 100 linelists. So if you have too many linelists, it will crash.
If you get a Fortran error forrtl: severe (24): end-of-file during read, unit -5, file Internal List-Directed Read
in the bsyn_lu 000000000041DB92 getlele_ 38 getlele.f
just after it trying to starting scan of linelist
with some molecular name, then the issue is probably the following one: