bcalden / ClusterPyXT

The Galaxy Cluster ‘Pypeline’ for X-ray Temperature Maps
BSD 3-Clause "New" or "Revised" License
27 stars 8 forks source link

Pipeline freezes during Stage 1 download #36

Open AstroRipples opened 1 year ago

AstroRipples commented 1 year ago

New user here... getting to grips with ClusterPyXT for data processing. I've set everything up according to the instructions on the main page, and when I set up a cluster for processing, the pipeline freezes during the download.

For context, I'm running CIAO-4.15 via the instructions on https://cxc.cfa.harvard.edu/ciao/download/ with the full CALDB installation on OSX 10.15.7 (Catalina), and CIAO runs successfully (I ran a different pipeline yesterday). After launching the virtualenv and running python3 clusterpyxt.py the GUI boots up correctly, but after entering information for my cluster target and clicking "Run Stage 1" I get the rainbow wheel and the Download message doesn't shift from 0%, even after waiting upwards of half an hour.

The download stage of the other pipeline only took a matter of a few minutes for each dataset, so I don't understand what the problem is. Any help would be appreciated!

AstroRipples commented 1 year ago

Update: I've installed a fresh environment with CIAO-4.14 and I encounter the same problem so it seems to be something more than a CIAO version issue.

AstroRipples commented 1 year ago

Additional information: I've re-downloaded ClusterPyXT and tried re-running using both the GUI and via the command line, as well as running additional tests on other obsids. In all cases, the pipeline creates directories for each obsid and then freezes when attempting to download data.

I end up with directories following the nomenclature: ${cluster_name}/${obsid}/analysis but this analysis directory is empty ; none of the expected primary or secondary directories are created. At this point I'm largely out of ideas, any thoughts @bcalden ?

AstroRipples commented 1 year ago

Updates: I've updated my OS to Ventura, rebuilt my virtual environments, and tried to run ClusterPyXT again. Whether I run using the GUI or via the command line, the pipeline doesn't even start the data download. The GUI shows the rainbow wheel and the command line doesn't progress beyond 0% even after leaving for several tens of minutes.

Downloading 1 observations:   0%|                                                 | 0/1 [00:00<?, ?observations/s]

Any suggestions what might be going on here, @bcalden or perhaps @botteon ?

bcalden commented 1 year ago

After you start Stage 1, can you go to the folder where the cluster you are working on is being stored to check for folders named after the observation ids you are working with?

Within those folders, you should see files being created and downloaded. If nothing is happening, it will require further diagnosis as I just tried with no errors. Are you still on CIAO 4.14 (that is what I just tested it on). If so, can I ask what cluster/obsid's you are working with so I can try to recreate your exact issue?

AstroRipples commented 1 year ago

Hi Brian! Okay, first up can confirm I'm still running CIAO 4.14, created using the specific versions substituted into the standard ciao build command as below:

conda create -n ciao-4.14 \
  -c https://cxc.cfa.harvard.edu/conda/ciao \
  -c conda-forge \
  "numpy<1.24" \
  ciao=4.14.0=py38h4384d24_0 sherpa ds9 ciao-contrib=4.14.3=py_0 caldb=4.9.8 marx

When I'm running clusterpyxt.py with the --continue flag (or click "continue cluster" in the GUI) no new directories get created in the Data directory and nothing gets downloaded. If I create a new cluster, a new directory with the cluster name gets created under /Data, as does a directory for the obsid, which contains an analysis subdirectory (i.e. I end up with Data/<cluster>/<obsid>/analysis) but nothing else.

One thing that makes this very hard to track down is that a colleague of mine has processed the same obsids using ClusterPyXT without any problems. We both run Macs with OSX Ventura and I've specifically built my ciao virtualenv to match theirs, yet we get these different outcomes.

I totally understand this sounds like a "me" problem, but I've exhausted about every avenue of investigation and experimentation I can think of to figure this out.

bcalden commented 1 year ago

I just tried on Ventura and it got through stage one. Can you (in the GUI) start a new cluster?

A115 with OBSID's 3233 and 13458. I just ran it this morning with a fresh download from Github on Ventura and got through stage 1 without issues.

bcalden commented 1 year ago

Also, running ciaover, it looks like I have a different version of ciao-contrib than you which may be the source of the error.

acis_bkg_evt 4.9.7 0 https://cxc.cfa.harvard.edu/conda/ciao caldb 4.10.2 0 https://cxc.cfa.harvard.edu/conda/ciao caldb_main 4.10.2 0 https://cxc.cfa.harvard.edu/conda/ciao ciao 4.14.0 py38h4384d24_0 https://cxc.cfa.harvard.edu/conda/ciao ciao-contrib 4.14.4 py_0 https://cxc.cfa.harvard.edu/conda/ciao ds9 8.3 1 https://cxc.cfa.harvard.edu/conda/ciao hrc_bkg_evt 4.7.7 1 https://cxc.cfa.harvard.edu/conda/ciao sherpa 4.14.0 py38h75233e6_0 https://cxc.cfa.harvard.ed

AstroRipples commented 1 year ago

Okay, I tried out the obsids you suggested and encountered the same issue I was finding previously, so this is definitely a me problem.

I've just rebuilt my CIAO environment with ciao-contrib 4.14.4 and likewise I encounter the problem. ciaover tells me I'm using the following:

$ ciaover
# packages in environment at /Users/cjriseley/miniconda3/envs/ciao-4.14:
#
# Name                    Version                   Build  Channel
acis_bkg_evt              4.9.7                         0    https://cxc.cfa.harvard.edu/conda/ciao
caldb                     4.10.2                        0    https://cxc.cfa.harvard.edu/conda/ciao
caldb_main                4.10.2                        0    https://cxc.cfa.harvard.edu/conda/ciao
ciao                      4.14.0           py38h4384d24_0    https://cxc.cfa.harvard.edu/conda/ciao
ciao-contrib              4.14.4                     py_0    https://cxc.cfa.harvard.edu/conda/ciao
ds9                       8.4.1                         0    https://cxc.cfa.harvard.edu/conda/ciao
hrc_bkg_evt               4.7.7                         1    https://cxc.cfa.harvard.edu/conda/ciao
sherpa                    4.14.0           py38h75233e6_0    https://cxc.cfa.harvard.edu/conda/ciao
bcalden commented 1 year ago

Well let's try running the command directly to see if that works. From the terminal, go to the folder where your cluster is (or A115) and run the following command within the ciao environment: (If you are doing your own cluster, make sure to change 3233 to one of your ObsIDs).

python3 -c "from ciao_contrib.cda.data import download_chandra_obsids; download_chandra_obsids([3233])"

Does anything download? You can check the folder in Finder to see it being populated in real time or wait for the command to complete.

AstroRipples commented 1 year ago

Now that works! All the data for one of my cluster obsids downloaded in maybe a couple of minutes 🥳

bcalden commented 1 year ago

I updated stage 1 of the pipeline in an effort to try and get around the problem your having. Can you go to your ClusterPyXT code directory and run git pull to get the latest commits. Once you have that, can you try running stage one from the guy for your cluster? If you do encounter an error, it likely won't be the same one you were getting before.

AstroRipples commented 1 year ago

Okay, after a git pull I re-launched ClusterPyXT from stage one from the guide. I tried using both the GUI and just via the command line python clusterpyxt.py --continue --config_file cluster.ini, and left the task running on a single obsid for about 45 minutes. When running the GUI I still get the rainbow wheel.

In that time, nothing happened aside from the file cluster_pypeline_config.ini getting created in the cluster directory. I checked the time stamps on everything in that directory, and they correspond to when I downloaded the obsids manually earlier ; the most recently modified file is cluster_pypeline_config.ini, but that timestamp corresponds to when I launched clusterpyxt. Very strange.

bcalden commented 1 year ago

Alright, I added a new branch, dev-test with an update to the download code. If this doesn't work, there is one other change I can try to make it work but first lets see if this works. If so, I will incorporate the change into the master branch as it is likely that multiple people are having this problem, you're just the one to post an issue.

Similar to the git pull you ran, can you run git checkout dev-test in terminal from the ClusterPyXT directory. Once that is complete, run a git pull for good measure (it will likely be up to date already, this is just to be sure).

From there, can you try stage 1 again, from the GUI and/or command line. Does this change anything?

AstroRipples commented 1 year ago

Thanks! Unfortunately still no joy. I left the GUI running on a single obsid for around an hour without any sign that anything was happening. So I killed that and re-launched from the command line. This I (accidentally) left running overnight with the same result: no outputs were created, and none of the files appeared to have been modified after I initially launched ClusterPyXT.