emerge-erc / ALminer

ALminer: ALMA archive mining and visualization toolkit
MIT License
23 stars 4 forks source link

download fitsonly fails #2

Closed mctoribio closed 1 year ago

mctoribio commented 1 year ago

Dear Aida, Alvaro, How are you? We are very happy to use ALminer for one little project to mine the archive. We are experiencing some issues in the download part, when using fitsonly option (the raw downloads well). We are using version 0.1.2. Even though we already specified the NRAO tap service as indicated, in the download part of your tutorial, with fitsonly, it fails:

Example 5.2: download only continuum FITS images for the science target
alminer.download_data(selected, fitsonly=True, dryrun=True, location='./data', 
                 filename_must_include=['_sci', '.pbcor', 'cont'], print_urls=True)
================================
This is a dryrun. To begin download, set dryrun=False.
================================
Download location = ./data
Total number of Member OUSs to download = 17
Selected Member OUSs: ['uid://A001/X121/X3d2', 'uid://A001/X121/X3d8', 'uid://A001/X121/X3d5', 'uid://A001/X121/X3cf', 'uid://A001/X2fe/Xcd2', 'uid://A001/X2fe/Xcce', 'uid://A001/X2fe/Xcd6', 'uid://A001/X2fe/X728', 'uid://A001/X2fe/X724', 'uid://A001/X5a4/X155', 'uid://A001/X87d/X207', 'uid://A001/X87d/X1f5', 'uid://A001/X87d/X1fb', 'uid://A001/X1290/X17', 'uid://A001/X1288/X6be', 'uid://A001/X1288/X6c2', 'uid://A001/X1590/Xd85']
Number of files to download = 0
Needed disk space = -- GB
File URLs to download = 
--------------------------------

I tried to run the astroquery part directly by specifying myAlma.archive_url = 'https://almascience.nrao.edu', but still it does not give any result for the links list. I hope you can make it run again! Thanks.

Best regards, Carmen

aida-ahmadi commented 1 year ago

Hi Carmen, nice to hear from you. I'm doing fine, thanks. How are things with you in the north? Glad to hear you're finding ALminer useful 😊

Thanks for reporting this. I had made some changes to the code back in August that I had not pushed to GitHub, so this was a good reminder. I have now released a new version (0.1.3) on GitHub and also on PyPI, so please try the new version and let me know if you're still encountering issues. Your example works fine for me. The new feature will allow you to select the archive mirror in the download_data function:

alminer.download_data(selected, fitsonly=True, dryrun=True, location='./data',  filename_must_include=['_sci', '.pbcor', 'cont'], print_urls=True, archive_mirror='ESO')

ESO is the default and fastest for us in Europe. Other options are NRAO and NAOJ.

A general note worth making is that the Jupyter notebooks on GitHub may be outdated. It's best to follow the documentation on Read The Docs.

Happy mining!

mctoribio commented 1 year ago

Dear Aida, Unfortunately after the update we still encounter the same issue (regardless the archive_mirror). We are using your latest version 0.1.3 and astroquery 0.4.6.

alminer.download_data(selected, fitsonly=True, dryrun=True, location='./data',  filename_must_include=['_sci', '.pbcor', 'cont'], print_urls=True, archive_mirror='ESO')
================================
This is a dryrun. To begin download, set dryrun=False.
================================
Nothing to download.
Note: often only a subset of the observations (e.g. the representative window) is ingested into the archive. In such cases, you may need to download the raw dataset, reproduce the calibrated measurement set, and image the observations of interest. It is also possible to request calibrated measurement sets through a Helpdesk ticket to the European ARC (see https://almascience.eso.org/local-news/requesting-calibrated-measurement-sets-in-europe).
--------------------------------
aida-ahmadi commented 1 year ago

Hi Carmen, could you please provide your query and sub-selection?

If you are doing the following query from ALminer's documentation, it works for me:

observations = alminer.keysearch({'science_keyword':["'Galaxy chemistry'"]}, print_targets=False)
selected = observations[(observations["min_freq_GHz"] > 80.0) & (observations["max_freq_GHz"] < 115.0) & (observations["ang_res_arcsec"] < 0.5)]
alminer.download_data(selected, fitsonly=True, dryrun=True, location='./data',  filename_must_include=['_sci', '.pbcor', 'cont'], print_urls=True, archive_mirror='ESO')

================================
This is a dryrun. To begin download, set dryrun=False.
================================
Download location = ./data
Total number of Member OUSs to download = 20
Selected Member OUSs: ['2013.1.00988.S_uid___A001_X121_X3d2_external_ari_l_001_of_001.tar', '2013.1.00988.S_uid___A001_X121_X3d8_external_ari_l_001_of_001.tar', '2013.1.00988.S_uid___A001_X121_X3d5_external_ari_l_001_of_001.tar', '2013.1.00988.S_uid___A001_X121_X3cf_external_ari_l_001_of_001.tar', '2015.1.00167.S_uid___A001_X2fe_Xcd2_external_ari_l_001_of_001.tar', '2015.1.00167.S_uid___A001_X2fe_Xcce_external_ari_l_001_of_001.tar', '2015.1.00167.S_uid___A001_X2fe_Xcd6_external_ari_l_001_of_001.tar', '2015.1.01439.S_uid___A001_X2fe_X728_001_of_001.tar', '2015.1.01439.S_uid___A001_X2fe_X728_external_ari_l_001_of_001.tar', '2015.1.01439.S_uid___A001_X2fe_X724_001_of_001.tar', '2015.1.01439.S_uid___A001_X2fe_X724_external_ari_l_001_of_001.tar', '2015.1.01487.S_uid___A001_X5a4_X155_001_of_001.tar', '2015.1.01487.S_uid___A001_X5a4_X155_external_ari_l_001_of_001.tar', '2016.1.00387.S_uid___A001_X87d_X207_001_of_001.tar', '2016.1.00387.S_uid___A001_X87d_X1f5_001_of_001.tar', '2016.1.00387.S_uid___A001_X87d_X1fb_001_of_001.tar', '2017.1.00078.S_uid___A001_X1290_X17_001_of_001.tar', '2017.1.01232.S_uid___A001_X1288_X6be_001_of_001.tar', '2017.1.01232.S_uid___A001_X1288_X6c2_001_of_001.tar', '2021.1.01188.S_uid___A001_X1590_Xd85_001_of_001.tar']
Number of files to download = 36
Needed disk space = 496.9 MB
File URLs to download = https://almascience.eso.org/dataPortal/member.uid___A001_X121_X3d2.ari_l.NGC1266_sci.spw0_1_2_3_96620MHz.12m.cont.I.tt0.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X121_X3d2.ari_l.NGC1266_sci.spw0_1_2_3_96620MHz.12m.cont.I.tt1.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X121_X3d8.ari_l.NGC1266_sci.spw0_1_2_3_104122MHz.12m.cont.I.tt0.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X121_X3d8.ari_l.NGC1266_sci.spw0_1_2_3_104122MHz.12m.cont.I.tt1.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X121_X3d5.ari_l.NGC1266_sci.spw0_1_2_3_100384MHz.12m.cont.I.tt0.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X121_X3d5.ari_l.NGC1266_sci.spw0_1_2_3_100384MHz.12m.cont.I.tt1.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X121_X3cf.ari_l.NGC1266_sci.spw0_1_2_3_92906MHz.12m.cont.I.tt0.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X121_X3cf.ari_l.NGC1266_sci.spw0_1_2_3_92906MHz.12m.cont.I.tt1.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X2fe_Xcd2.ari_l.Arp220_sci.spw0_1_2_3_95530MHz.12m.cont.I.tt0.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X2fe_Xcd2.ari_l.Arp220_sci.spw0_1_2_3_95530MHz.12m.cont.I.tt1.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X2fe_Xcce.ari_l.Arp220_sci.spw0_1_2_3_92025MHz.12m.cont.I.tt0.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X2fe_Xcce.ari_l.Arp220_sci.spw0_1_2_3_92025MHz.12m.cont.I.tt1.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X2fe_Xcd6.ari_l.Arp220_sci.spw0_1_2_3_99090MHz.12m.cont.I.tt0.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X2fe_Xcd6.ari_l.Arp220_sci.spw0_1_2_3_99090MHz.12m.cont.I.tt1.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X2fe_X728.ngc6240_sci.spw19_25_27_29.cont.I.tt0.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X2fe_X728.ngc6240_sci.spw19_25_27_29.cont.I.tt1.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X2fe_X728.ari_l.ngc6240_sci.spw0_1_2_3_101029MHz.12m.cont.I.tt0.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X2fe_X728.ari_l.ngc6240_sci.spw0_1_2_3_101029MHz.12m.cont.I.tt1.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X2fe_X724.ngc6240_sci.spw25_27_29_31.cont.I.tt0.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X2fe_X724.ngc6240_sci.spw25_27_29_31.cont.I.tt1.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X2fe_X724.ari_l.ngc6240_sci.spw0_1_2_3_92108MHz.12m.cont.I.tt0.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X2fe_X724.ari_l.ngc6240_sci.spw0_1_2_3_92108MHz.12m.cont.I.tt1.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X5a4_X155.n613_sci.spw25_27_29_31_33.cont.I.tt0.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X5a4_X155.n613_sci.spw25_27_29_31_33.cont.I.tt1.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X5a4_X155.ari_l.n613_sci.spw0_1_2_3_4_103190MHz.12m.cont.I.tt0.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X5a4_X155.ari_l.n613_sci.spw0_1_2_3_4_103190MHz.12m.cont.I.tt1.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X87d_X207.NGC4418_sci.spw25_27_29_31.cont.I.tt0.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X87d_X207.NGC4418_sci.spw25_27_29_31.cont.I.tt1.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X87d_X1f5.NGC4418_sci.spw25_27_29_31.cont.I.tt0.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X87d_X1f5.NGC4418_sci.spw25_27_29_31.cont.I.tt1.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X87d_X1fb.NGC4418_sci.spw25_27_29_31.cont.I.tt0.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X87d_X1fb.NGC4418_sci.spw25_27_29_31.cont.I.tt1.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X1290_X17.NGC7469_sci.spw19_21.cont.I.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X1288_X6be.Cloverleaf_sci.spw22_24_26.cont.I.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X1288_X6c2.Cloverleaf_sci.spw22_24_26.cont.I.pbcor.fits
https://almascience.eso.org/dataPortal/member.uid___A001_X1590_Xd85.GDS-48417_sci.spw16.cont.I.pbcor.fits
--------------------------------

There have been a lot of changes to the developer's version of astroquery so it could be that you need to upgrade your astroquery installation. I am using astroquery 0.4.7.dev8038.

mctoribio commented 1 year ago

Dear Aida,

Many thanks. Updating astroquery solved the issue. pip install --force-reinstall -v "astroquery==0.4.7.dev8038"

You can certainly update ALminer's requirements. Thanks again.

Cheers, Carmen

PS: Yes, apologies. I was indeed doing the query from the Jupyter Notebook in ALminer's documentation, exactly the one you posted before..

aida-ahmadi commented 1 year ago

Dear Carmen,

No worries at all. Will do.

I'll close this ticket now, but feel free to open new issues if you encounter them.

Cheers, Aida