Chandra-MARX / marx

Chandra X-ray Observatory ray-trace simulator
http://space.mit.edu/cxc/marx/
5 stars 4 forks source link

Error "illegal hardware instruction" with Rosetta / M1 #52

Open adonath opened 2 years ago

adonath commented 2 years ago

I'm using a conda osx-64 environment on an M1 machine, which runs in the command line using Rosetta. I created the ciao environment as described here https://cxc.cfa.harvard.edu/ciao/download/ and everything works fine. I'm following https://cxc.cfa.harvard.edu/ciao/threads/marx/ to finally create a PSF image. For this I installed marx using the install_marx script. However when running simulate_psf I receive:

# simulate_psf (08 March 2021): ERROR Problem running marx2fits --pixadj=RANDOMIZE psf_i0000_marx.dir psf_i0000_marx.fits

And when running marx2fits --pixadj=RANDOMIZE psf_i0000_marx.dir psf_i0000_marx.fits independently I get the following error:

zsh: illegal hardware instruction  marx2fits --pixadj=RANDOMIZE psf_i0000_marx.dir psf_i0000_marx.fits

The error suggests an interference with some other libraries build for the osx-arm64 or similar. The part that confuses me is, that marx2fits, without any arguments just runs fine. I presume the solution is somehow to re-build marx from source and link to the correct libraries, however any help is appreciated here!

hamogu commented 2 years ago

If you follows the CIAO download instructions that you linked, then you probably executed somethimg like this (maybe with some additonal Python packages thrown in - I usually also install astropy, scipy etc.)

conda create -n ciao-4.14 \
  -c https://cxc.cfa.harvard.edu/conda/ciao \
  -c conda-forge \
  ciao sherpa ds9 ciao-contrib caldb_main marx

In other words, you probably installed marx already as a conda pacakge. So, there should be no need to run install_marx at all. What do you get if you type which marx and which marx2fits? I wonder if the conda version of the version that you build through the install_marx script is earlier in your path.

I'm also somewhat surprised that is does not work, because marx does not depend on any external library (e.g. instead of using cfitsio, marx includes its own jdfits library - the reason is historic: When marx was writen, cfitsio was not available). However, I also can't debug that without any output and install_marx hides that output from the user. If you really do want to install marx from source, you can find full instructions however, that should not be necessary at all since we provide marx binaries through the conda channel.

adonath commented 2 years ago

Thanks @hamogu, indeed marx was already included in my conda environment. However when I first ran simulate_psf, I received an error like # simulate_psf (08 March 2021): ERROR MARX_ROOT is not provided in parameter marx_root. So I defined export MARX_ROOT=~/software/mambaforge-intel/envs/ciao-4.14/, which fixed the error but then led to the error I described in my initial post. Then I tried install_marx, but the error remained.

which marx and which marx2fits point to:

So it seems they point to the conda installed binaries. Just to make sure I haven't screwed up my environment with the install_marx command, I'll set it up again and let you know, whether the error persists.

adonath commented 2 years ago

Even in a newly installed environment the error persists. This is the environment I'm using: https://github.com/astrostat/pylira-extra/blob/main/datasets/chandra-arlac/environment.yaml

hamogu commented 2 years ago

I ordered and M1 machine for myself a few weeks ago just to prepare for this case to give me at least some chance of debugging M1 issues.... I'll look into it, but (since I received that machine only 2 weeks ago) I have no experience with this wired double-architecture yet and it would be great if I could take some time (plus, the holidays are coming). Do you need an urgent solution or can you work around it for now (e.g. use a different computer for the running marx)?

hamogu commented 2 years ago

Also, if possible, it would be great if you could post a complete example here. You say above that marx2fits with no (=default) arguments works, while running it in your case crashed it. Can you let me know the exact parameters you are passing to simulate_psf and what event file you are using (if you give me, e.g. the OBSID, I can download that myself assuming it's a public dataset)?

adonath commented 2 years ago

Thanks a lot @hamogu! I'm running the following script, which defines the analysis steps https://github.com/astrostat/pylira-extra/blob/main/datasets/chandra-arlac/make.py So you could basically clone https://github.com/astrostat/pylira-extra/, create the environment and run make.py, except that there is the additional manual step of running chart via the web interface (https://cxc.cfa.harvard.edu/ciao/PSFs/chart2/runchart.html). If needed I can also provide you directly with the chart files or intermediate file created by marx before running marx2fits.

Meanwhile I'll check for a workaround...

adonath commented 2 years ago

Here is the output of conda info:

     active environment : ciao-4.14
    active env location : /Users/adonath/software/mambaforge-intel/envs/ciao-4.14
            shell level : 2
       user config file : /Users/adonath/.condarc
 populated config files : /Users/adonath/software/mambaforge-intel/.condarc
          conda version : 4.11.0
    conda-build version : not installed
         python version : 3.9.7.final.0
       virtual packages : __osx=11.6=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /Users/adonath/software/mambaforge-intel  (writable)
      conda av data dir : /Users/adonath/software/mambaforge-intel/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/osx-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /Users/adonath/software/mambaforge-intel/pkgs
                          /Users/adonath/.conda/pkgs
       envs directories : /Users/adonath/software/mambaforge-intel/envs
                          /Users/adonath/.conda/envs
               platform : osx-64
             user-agent : conda/4.11.0 requests/2.26.0 CPython/3.9.7 Darwin/20.6.0 OSX/11.6
                UID:GID : 502:20
             netrc file : None
           offline mode : False
hamogu commented 2 years ago

I'm missing the file oif.fits that's read in line 29 of make.py. I've set up the environment and run a few other invocations of marx2fits with no problem - which matches what you described above. So it looks as if the problem is triggered only in some specific circumstance that's related to your file, e.g. an input that's a double where a float is expected or something like that. Unfortunately, that means I probably need to look at your file to reproduce this and the easiest might be if you are willing to share your intermediate files with me.

Depending on how big the files are, you might not want to attach them to a github issue, but could send them to me by email or point me to a location on the Cfa HEAD LAN e.g. /pool7/xxx/xxx .

hamogu commented 2 years ago

It's latel I should go to bed. That's of course stupid of me. I just need ot download the OBSID by hand first so that I get the oif.fits. Then, I can run you make.py. Working on that.

hamogu commented 2 years ago

OK, I've run through all your steps and it works for me:

(issue52) guenther@MoritzAirRoseGold issue_52 % python pylira-extra/datasets/chandra-arlac/make.py
INFO:__main__:Skipping download, 1385 already exists.
INFO:__main__:Skipping reprocessing, 1385/repro already exists.
INFO:__main__:Skipping spectral extraction, 1385/spectrum/ArLac.pi already exists.
INFO:__main__:Skipping spectral fit, 1385/spectrum/source-flux-chart-ArLac.dat already exists.
INFO:__main__:Skipping counts image, 1385/lira-input/counts.fits already exists.
INFO:__main__:Executing: simulate_psf infile=1385/repro/hrcf01385_repro_evt2.fits outroot=psf ra=332.17007516 dec=45.74225112 simulator=file rayfile=1385/psf/chart/HRMA_ra332.17008_dec45.74225_source-flux-chart-ArLac.dat_dithered_i0000_rays.fits mode=h
simulate_psf
          infile = 1385/repro/hrcf01385_repro_evt2.fits
         outroot = psf
              ra = 332.17007516
             dec = 45.74225112
    spectrumfile = 
      monoenergy = INDEF
            flux = INDEF
       simulator = file
         rayfile = 1385/psf/chart/HRMA_ra332.17008_dec45.74225_source-flux-chart-ArLac.dat_dithered_i0000_rays.fits
       projector = marx
     random_seed = -1
            blur = 0.07000000000000001
  readout_streak = no
          pileup = no
           ideal = yes
        extended = yes
         binsize = 1
          numsig = 7
         minsize = INDEF
         numiter = 1
        keepiter = no
        asolfile = 
       marx_root = /Users/guenther/mambaforge/envs/issue52/
         verbose = 1
            mode = h

Started check_setup
Finished check_setup
Performing iteration 1 of 1
Started run_marx
Finished run_marx
Started create_psf_image
Finished create_psf_image
Started create_average_image
Finished create_average_image

Final output PSF image is : psf.psf[PSF]

That let's me think that it's your setup, not marx. I see above that you use mambaforge (like I do) and also have the same conda version that I use. So, that can't be it. However, your path indicates that you have a (separate?) mamba for intel "/Users/adonath/software/mambaforge-intel/envs/ciao-4.14", while I'm using only the ARM mamba and then install environment that require rosatta2 with specific settings per environment. Also, I don't re-complie marx with "install_marx", I simply set the MARX_ROOT appropriately to use the marx conda package, but I guess that's what you tried, too, right?

Some notes on what I did are below; but I did not test if installing an intel specific mamba and using that for your environment can trigger the problem - I only know that my setup does not trigger it. Also, I wonder, why does this happen, of all things, in marx2fits? I have no idea, but it's probably more important to make it work then to find the actual cause.


Since conda-forge now offers arm64 (M1, Apple Silicon) native builds for miniforge/mambaforge (https://github.com/conda-forge/miniforge#download) and for essentially all packages on conda-forge, my conda installation is ARM-native. So, if I naively put in the commands from the CIAO install threads

$ conda create -n ciao-4.14 \
  -c https://cxc.cfa.harvard.edu/conda/ciao \
  -c conda-forge \
  ciao sherpa ds9 ciao-contrib caldb_main marx

conda fails to find the packages since it's only looking into the noarch and osx-arm64 directories of each channel, but we only have osx-64 builds for CIAO and our other binary packages. That is true even if the terminal where I type that command itself runs under Rosetta2.

However, I can tell conda which subdir in each channel to look at specifically:

CONDA_SUBDIR=osx-64 conda create -n ciao-4.14 \  
  -c https://cxc.cfa.harvard.edu/conda/ciao \
  -c conda-forge \
  ciao sherpa ds9 ciao-contrib caldb_main marx

Again, for this command it does not matter if I'm running it in a "normal" terminal or an "rosetta2" terminal, since it's just downloaded and installing, not using the packages.

After creating an environment, it makes sense to set the CONDA_SUBDIR variable permanently, in case I later want to add some other copnda package (e.g. jupyter):

conda activate ciao-4.14
conda env config vars set CONDA_SUBDIR=osx-64

(Alternatively, I would just create an empty environment and then do the conda env config ... before installing anything.)

That complete the setup. In order to use the CIAO software, I have to open a terminal with Rosetta2 emulation, and the I can do the usual

conda activate ciao-4.14

and go from there.

hamogu commented 2 years ago

Can you check your $PATH? Maybe there is something in there from your earlier attempt to compile marx with install_marx? marx builds a few libraries (all part of the marx source code), so even if the marx and marx2fits executables are correct it's still possible to have the wrong libraries in the path.

adonath commented 2 years ago

Thanks a lot for your help @hamogu!

I didn't know about the CONDA_SUBDIR configuration, that's why I installed a second osx-64 mambaforge on my machine. So far this has worked well for me. However I actually prefer your approach of defining the CONDA_SUBDIR and only having one conda install on my disk. So I deleted my mambaforge-intel and created a new ciao-4.14 environment following your instructions. This worked flawlessly, however I still got the same error when running simulate_psf...

My $PATH looks fine:

$ echo $PATH
/Users/adonath/software/mambaforge/envs/ciao-4.14/bin:/Users/adonath/software/mambaforge/condabin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin

CONDA_SUBDIR is defined correctly:

$echo $CONDA_SUBDIR
osx-64

MARX_ROOT is defined correctly:

$echo $MARX_ROOT
/Users/adonath/software/mambaforge/envs/ciao-4.14/

Output of conda info:

     active environment : ciao-4.14
    active env location : /Users/adonath/software/mambaforge/envs/ciao-4.14
            shell level : 2
       user config file : /Users/adonath/.condarc
 populated config files : /Users/adonath/software/mambaforge/.condarc
          conda version : 4.11.0
    conda-build version : not installed
         python version : 3.9.7.final.0
       virtual packages : __osx=11.6=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /Users/adonath/software/mambaforge  (writable)
      conda av data dir : /Users/adonath/software/mambaforge/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/osx-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /Users/adonath/software/mambaforge/pkgs
                          /Users/adonath/.conda/pkgs
       envs directories : /Users/adonath/software/mambaforge/envs
                          /Users/adonath/.conda/envs
               platform : osx-64
             user-agent : conda/4.11.0 requests/2.26.0 CPython/3.9.7 Darwin/20.6.0 OSX/11.6
                UID:GID : 502:20
             netrc file : None
           offline mode : False

The terminal runs correctly in Rosetta mode:

image

What is interesting: when running the terminal without Rosetta2 activated I get exactly the same error (except they I expect it there...). So I'm starting to doubt my Rosetta2 mode works correctly.

hamogu commented 2 years ago

Unfortunately, I can't help with debugging why Rosetta2 might not do the right thing. I'm still very new to this M1 system myself... What's odd to me it that your script makes it all the way through to marx2fits, while chandra_repro, extract_spectra and marx - all of which are binaries and not just Python scripts - work. But then, you get marx2fits to work, too for some settings of the parameters. If the terminal was not doing the right thing with Rosetta2, I would expect it to fail much earlier. Now, those thought might not be much help for you, but it does not look to me that if this is something that I can debug and fix on the marx side - your script works on my M1.

adonath commented 2 years ago

What's odd to me it that your script makes it all the way through to marx2fits, while chandra_repro, extract_spectra and marx - all of which are binaries and not just Python scripts - work. But then, you get marx2fits to work, too for some settings of the parameters. If the terminal was not doing the right thing with Rosetta2, I would expect it to fail much earlier.

I completely agree here, the behavior doesn't make any sense to me...

I have access to another M1 MacBook Air so I will now try to make it work there and report.

adonath commented 2 years ago

For completeness: I setup the environment on the M1 MacBook Air I have access to and it runs through without any issue. So I cannot reproduce the issue on the M1 MacBook Air either. However the issue remains for the M1 MacBook Pro I have. For now I'll continue with my working setup. If I have time I could check the differences in environment between the M1 Pro and Air I have...

hamogu commented 8 months ago

Coming back to this old issue... Is this still an issue? If so, is there anything I can do about it? I'm a little lost how to move this forward, since I can't reproduce it on my Mac. Or can I close this issue?