DOI-USGS / ISIS3

Integrated Software for Imagers and Spectrometers v3. ISIS3 is a digital image processing software package to manipulate imagery collected by current and past NASA and International planetary missions.
https://isis.astrogeology.usgs.gov
Other
197 stars 167 forks source link

Problems with new ISISDATA downloadIsisData Utility #5024

Closed KrisBecker closed 10 months ago

KrisBecker commented 2 years ago

ISIS version(s) affected: 7.1.0_RC1

Description
I have encountered what appears to be a bug when using the new ISISDATA data download utility, downloadIsisData.

Downloading/Updating All ISISDATA

When running the recommended command to update all ISISDATA directories, the following error is produced:

(CometRC710d) zion2[1050]: downloadIsisData --dry-run -v ALL ./data        
2022-08-08 09:10:37 INFO     ---Script starting---
Traceback (most recent call last):
  File "/Users/kbecker/miniconda3/envs/CometRC710d/bin/downloadIsisData", line 208, in <module>
    main(args.mission, args.dest, os.path.expanduser(args.config), args.dry_run, args.num_transfers)
  File "/Users/kbecker/miniconda3/envs/CometRC710d/bin/downloadIsisData", line 159, in main
    raise LookupError(f"{mission} is not in the list of supported missions: {supported_missions.keys()}")
LookupError: ALL is not in the list of supported missions: dict_keys(['apollo15', 'apollo16', 'apollo17', 'base', 'cassini', 'chandrayaan1', 'clementine1', 'dawn', 'hayabusa2', 'hayabusa', 'juno', 'kaguya', 'legacybase', 'lro', 'mariner10', 'messenger', 'mex', 'mgs', 'mro', 'near', 'newhorizons', 'odyssey', 'osirisrex', 'rolo', 'rosetta', 'smart1', 'tgo', 'viking1', 'viking2', 'voyager1', 'voyager2'])

Indications are that ALL is not a supported mission.

Documentation Issues

The utility documentation is inconsistent with ISIS documentation and provides an invalid example.

(CometRC710d) zion2[1047]: downloadIsisData -h
usage: downloadIsisData [-h] [--dry-run] [-v] [-n NUM_TRANSFERS] [--config CONFIG] mission dest

This will allow for a user to download isis data directly to their machine from the USGS S3 buckets as well as public end points

To use the download ISIS Data script you must supply 3 parameters with an optional 4th.

<rclone command> <Mission name> <destination to copy to> <--dry-run (optional)> 
Example of how to run this program:
python downloadIsisData.py tgo ~/isisData/tgo

NOTE: if you would like to download the data for every mission supply the value ALL for the <Mission name> parameter

positional arguments:
  mission               mission for files to be downloaded
  dest                  the destination to download files from source

optional arguments:
  -h, --help            show this help message and exit
  --dry-run             run a dry run for rclone value should be a boolean
  -v, --verbose
  -n NUM_TRANSFERS, --num-transfers NUM_TRANSFERS
  --config CONFIG

When running the example, it produces the following:

(CometRC710d) zion2[1068]: downloadIsisData --dry-run tgo ./data/tgo 
2022-08-08 09:27:05 ERROR : Local file system at /Volumes/KJBData/ISIS/isisdata/data/tgo/tgo/kernels: Ignoring --track-renames as the source and destination do not have a common hash

It looks as though the ./data/tgo should just be ./data.

And the example uses the command with python invocation and a .py extension, which is not a viable example given the installation of the script.

Script Installation

The installation of the utility is inconsistent with runtime scenarios. For example, the CMAKE installation of downloadIsisData is in the CMAKE_INSTALL_PREFIX directory, but the find_conf() Python function looks for it in $CONDA_PREFIX. Perhaps using $ISISROOT would be indicated here for consistency.

This may not be directly related, but I'll mention it anyway. The installation of the $ISISROOT/scripts directory does not preserve permissions as set in the ISIS source tree:

$PROJECT_ROOT/ISIS3/isis/scripts
(CometRC710d) zion2[1096]: ls -al
total 256
drwxr-xr-x  25 kbecker  staff    800 Aug  2 10:18 .
drwxr-xr-x  17 kbecker  staff    544 Aug  2 10:18 ..
-rw-r--r--   1 kbecker  staff   2298 Aug  2 10:18 DeployDarwin.csh
-rw-r--r--   1 kbecker  staff   2147 Aug  2 10:18 IsisInlineDocumentBuild_mod.xsl
-rw-r--r--   1 kbecker  staff    835 Aug  2 10:18 PullLastMod.xsl
-rwxr-xr-x   1 kbecker  staff  23254 Aug  2 10:18 SetRunTimePath
-rwxr-xr-x   1 kbecker  staff   4859 Aug  2 10:18 csvdiff.py
-rw-r--r--   1 kbecker  staff    794 Aug  2 10:18 darwin_IsisDlm_paths.lis
-rw-r--r--   1 kbecker  staff    498 Aug  2 10:18 darwin_bin_paths.lis
-rw-r--r--   1 kbecker  staff    647 Aug  2 10:18 darwin_lib_paths.lis
-rwxr-xr-x   1 kbecker  staff   8310 Aug  2 10:18 downloadIsisData
-rwxr-xr-x   1 kbecker  staff   2625 Aug  2 10:18 isis3Startup.csh
-rwxr-xr-x   1 kbecker  staff   1182 Aug  2 10:18 isis3Startup.py
-rwxr-xr-x   1 kbecker  staff    877 Aug  2 10:18 isis3Startup.sh
-rw-r--r--   1 kbecker  staff   1000 Aug  2 10:18 isisStartup.csh
-rw-r--r--   1 kbecker  staff    844 Aug  2 10:18 isisStartup.sh
-rwxr-xr-x   1 kbecker  staff   5512 Aug  2 10:18 isisVarInit.py
-rw-r--r--   1 kbecker  staff    600 Aug  2 10:18 isis_bins_paths.lis
-rwxr-xr-x   1 kbecker  staff   3698 Aug  2 10:18 makeOutput.py
-rw-r--r--   1 kbecker  staff    774 Aug  2 10:18 qt_paths.lis
-rw-r--r--   1 kbecker  staff   1016 Aug  2 10:18 qt_plugins_paths.lis
-rw-r--r--   1 kbecker  staff    533 Aug  2 10:18 tabCompletion.csh
-rwxr-xr-x   1 kbecker  staff   1733 Aug  2 10:18 unitTester
-rwxr-xr-x   1 kbecker  staff   2265 Aug  2 10:18 zenodo_order.py
-rwxr-xr-x   1 kbecker  staff   2184 Aug  2 10:18 zenodo_to_authors.py

In the installation directory, which happens to be $CONDA_PREFIX:

(CometRC710d) zion2[1062]: pushd $ISISROOT
~/miniconda3/envs/CometRC710d /Volumes/KJBData/ISIS/isisdata
(CometRC710d) zion2[1063]: echo $ISISROOT
/Users/kbecker/miniconda3/envs/CometRC710d
(CometRC710d) zion2[1064]: ls -l ./scripts
total 256
-rw-r--r--  1 kbecker  staff   2298 Aug  2 10:18 DeployDarwin.csh
-rw-r--r--  1 kbecker  staff   2147 Aug  2 10:18 IsisInlineDocumentBuild_mod.xsl
-rw-r--r--  1 kbecker  staff    835 Aug  2 10:18 PullLastMod.xsl
-rw-r--r--  1 kbecker  staff  23254 Aug  2 10:18 SetRunTimePath
-rw-r--r--  1 kbecker  staff   4859 Aug  2 10:18 csvdiff.py
-rw-r--r--  1 kbecker  staff    794 Aug  2 10:18 darwin_IsisDlm_paths.lis
-rw-r--r--  1 kbecker  staff    498 Aug  2 10:18 darwin_bin_paths.lis
-rw-r--r--  1 kbecker  staff    647 Aug  2 10:18 darwin_lib_paths.lis
-rw-r--r--  1 kbecker  staff   8310 Aug  2 10:18 downloadIsisData
-rw-r--r--  1 kbecker  staff   2625 Aug  2 10:18 isis3Startup.csh
-rw-r--r--  1 kbecker  staff   1182 Aug  2 10:18 isis3Startup.py
-rw-r--r--  1 kbecker  staff    877 Aug  2 10:18 isis3Startup.sh
-rw-r--r--  1 kbecker  staff   1000 Aug  2 10:18 isisStartup.csh
-rw-r--r--  1 kbecker  staff    844 Aug  2 10:18 isisStartup.sh
-rw-r--r--  1 kbecker  staff   5512 Aug  2 10:18 isisVarInit.py
-rw-r--r--  1 kbecker  staff    600 Aug  2 10:18 isis_bins_paths.lis
-rw-r--r--  1 kbecker  staff   3698 Aug  2 10:18 makeOutput.py
-rw-r--r--  1 kbecker  staff    774 Aug  2 10:18 qt_paths.lis
-rw-r--r--  1 kbecker  staff   1016 Aug  2 10:18 qt_plugins_paths.lis
-rw-r--r--  1 kbecker  staff    533 Aug  2 10:18 tabCompletion.csh
-rw-r--r--  1 kbecker  staff   1733 Aug  2 10:18 unitTester
-rw-r--r--  1 kbecker  staff   2265 Aug  2 10:18 zenodo_order.py
-rw-r--r--  1 kbecker  staff   2184 Aug  2 10:18 zenodo_to_authors.py

Maybe this is intentional.

How to reproduce

Possible Solution

Additional context

Kelvinrr commented 2 years ago

@KrisBecker I caught these over the weekend as well. I fixed most of them and can get a PR in a little bit. I think they got through with the haphazard way that PR got merged during a release.

Kelvinrr commented 2 years ago

@KrisBecker Do you want to try redownloading using the instructions at the bottom of dev's readme?

KrisBecker commented 2 years ago

@Kelvinrr I downloaded the script and the rclone.conf files as described at the bottom of dev/README.md. There are a few issues that remain. (Note I am using python 3.9.13.)

There appears to be two sources for each dataset to download kernels from - one from the mission SPICE archive source (sometimes NAIF) and then the USGS ISIS data. When running the script for a particular mission, say messenger, it first downloads the entire SPICE archive and then the ISIS data from ASC/AWS servers. The net effect of this is the complete MESSENGER SPICE archive download size is ~41GB. Then when the USGS data is downloaded, it deletes everything but the ISIS data download. Is this a bug or a feature? Why download the full mission SPICE archive when it winds up being deleted/redundant with USGS data anyway? This appears to be the behavior for every mission in the config.

python downloadIsisData  --config rclone.conf -v messenger /Volumes/KJBData/ISIS/isisdata/newdata

More concerning is that when the messenger_usgs data completes downloading, almost all the data is deleted leaving only ~314MB whereas the rsync download is ~17GB. Here is a portion of the log info:

2022-08-24 12:49:18 INFO  : kernels/ck/msgr_0502.lbl: Deleted
2022-08-24 12:49:18 INFO  : kernels/ck/msgr_0809_v02.lbl: Deleted
2022-08-24 12:49:18 INFO  : kernels/ck/msgr_1210_v01.bc: Deleted
2022-08-24 12:49:18 INFO  : kernels/ck/msgr_1405_v02.lbl: Deleted
2022-08-24 12:49:18 INFO  : kernels/ck/msgr_0801_v02.bc: Deleted
2022-08-24 12:49:18 INFO  : kernels/ck/msgr_0803_v02.lbl: Deleted
2022-08-24 12:49:18 INFO  : kernels/ck/msgr_1402_v01.bc: Deleted
2022-08-24 12:49:18 INFO  : kernels/ck/msgr_1311_v01.lbl: Deleted
2022-08-24 12:49:18 INFO  : kernels/ck/msgr_1409_v01.bc: Deleted
2022-08-24 12:49:18 INFO  : kernels/ck/msgr_0910_v02.lbl: Deleted
2022-08-24 12:49:18 INFO  : kernels/spk: Removing directory
2022-08-24 12:49:18 INFO  : kernels/lsk: Removing directory
2022-08-24 12:49:18 INFO  : kernels/ek: Removing directory
Transferred:      340.914 MiB / 340.914 MiB, 100%, 1.673 MiB/s, ETA 0s
Checks:               894 / 894, 100%
Deleted:              878 (files), 3 (dirs)
Transferred:          210 / 210, 100%
Elapsed time:       1m5.7s
2022/08/24 12:49:18 INFO  : 
Transferred:      340.914 MiB / 340.914 MiB, 100%, 1.673 MiB/s, ETA 0s
Checks:               894 / 894, 100%
Deleted:              878 (files), 3 (dirs)
Transferred:          210 / 210, 100%
Elapsed time:       1m5.7s

Note two critical directories (spk, lsk) are completely removed!

And comparing the sizes of the old download and new:

(base) zion2[1171]: du -sh data/messenger newdata/messenger 
 17G    data/messenger
341M    newdata/messenger

Some mission cases are extreme. Take for example, OREX. Its SPICE archive is 397GB and the final install is 8.7GB - nearly 45 times the final download size!

Also, many of the references to mission SPICE archives fail to download at all for some reason. Here is the log for mariner10 data:

(CometsAug17) zion2[1051]: python downloadIsisData  --config rclone.conf -v mariner10 /Volumes/KJBData/ISIS/isisdata/newdata
2022-08-24 12:14:01 INFO     ---Script starting---
2022-08-24 12:14:04 INFO  : .: Copied (new)
2022-08-24 12:14:04 INFO  : .: Copied (Rcat, new)
Transferred:       23.268 KiB / 23.268 KiB, 100%, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         2.7s
2022/08/24 12:14:04 INFO  : 
Transferred:       23.268 KiB / 23.268 KiB, 100%, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         2.7s

2022-08-24 12:14:05 INFO  : Local file system at /Volumes/KJBData/ISIS/isisdata/newdata/mariner10: Making map for --track-renames
2022-08-24 12:14:05 INFO  : Local file system at /Volumes/KJBData/ISIS/isisdata/newdata/mariner10: Finished making map for --track-renames
2022-08-24 12:14:05 INFO  : reseaus/mar10bMasterReseaus.pvl: Copied (new)
2022-08-24 12:14:05 INFO  : reseaus/mar10Merc2Nominal.pvl: Copied (new)
2022-08-24 12:14:05 INFO  : reseaus/mar10aMasterReseaus.pvl: Copied (new)
2022-08-24 12:14:05 INFO  : reseaus/mar10MoonNominal.pvl: Copied (new)
2022-08-24 12:14:05 INFO  : reseaus/mar10Merc1Nominal.pvl: Copied (new)
2022-08-24 12:14:05 INFO  : reseaus/mar10Merc3Nominal.pvl: Copied (new)
2022-08-24 12:14:05 INFO  : reseaus/mar10VenusNominal.pvl: Copied (new)
2022-08-24 12:14:05 INFO  : calibration/mariner_10_B_dc.cub: Copied (new)
2022-08-24 12:14:05 INFO  : calibration/mariner_10_blem_A.cub: Copied (new)
2022-08-24 12:14:05 INFO  : reseaus/mar10b.template.cub: Copied (new)
2022-08-24 12:14:05 INFO  : calibration/mariner_10_A_dc.cub: Copied (new)
2022-08-24 12:14:05 INFO  : kernels/spk/kernels.0001.db: Copied (new)
2022-08-24 12:14:05 INFO  : calibration/mariner_10_blem_B.cub: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/spk/MARINER_10_A_gem.bsp: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/spk/MERCURY_MARINER_10_B.bsp: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/spk/MERCURY_MARINER_10_A.bsp: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/fk/kernels.0001.db: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/fk/mariner10.0001.tf: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/iak/mariner10Addendum001.ti: Copied (new)
2022-08-24 12:14:06 INFO  : reseaus/mar10a.template.cub: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/iak/kernels.0001.db: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/spk/MARINER_10_B_gem.bsp: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/ck/kernels.0001.db: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/iak/mariner10Addendum002.ti: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/ck/MERCURY_MARINER_10_A.bc: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/ck/MARINER_10_B_gem.bc: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/ck/MARINER_10_A_gem.bc: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/ck/MERCURY_MARINER_10_B.bc: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/ik/kernels.0001.db: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/sclk/mariner10.0001.tsc: Copied (new)
2022-08-24 12:14:06 INFO  : kernels/sclk/kernels.0001.db: Copied (new)
2022-08-24 12:14:07 INFO  : calibration/mariner_10_CLE_B_coef.cub: Copied (new)
2022-08-24 12:14:08 INFO  : calibration/mariner_10_OR_A_coef.cub: Copied (new)
2022-08-24 12:14:08 INFO  : calibration/mariner_10_CLE_A_coef.cub: Copied (new)
2022-08-24 12:14:08 INFO  : calibration/mariner_10_MUV_A_coef.cub: Copied (new)
2022-08-24 12:14:09 INFO  : calibration/mariner_10_ORA_A_coef.cub: Copied (new)
2022-08-24 12:14:10 INFO  : calibration/mariner_10_OR_B_coef.cub: Copied (new)
2022-08-24 12:14:10 INFO  : calibration/mariner_10_CL_A_coef.cub: Copied (new)
2022-08-24 12:14:10 INFO  : calibration/mariner_10_BLU_B_coef.cub: Copied (new)
2022-08-24 12:14:10 INFO  : calibration/mariner_10_UV_B_coef.cub: Copied (new)
2022-08-24 12:14:11 INFO  : calibration/mariner_10_ORA_B_coef.cub: Copied (new)
2022-08-24 12:14:11 INFO  : calibration/mariner_10_BL_B_coef.cub: Copied (new)
2022-08-24 12:14:12 INFO  : calibration/mariner_10_UV_A_coef.cub: Copied (new)
2022-08-24 12:14:12 INFO  : calibration/mariner_10_UVP_A_coef.cub: Copied (new)
2022-08-24 12:14:14 INFO  : calibration/mariner_10_CL_B_coef.cub: Copied (new)
2022-08-24 12:14:14 INFO  : kernels/.: Deleted
Transferred:      321.338 MiB / 321.338 MiB, 100%, 35.582 MiB/s, ETA 0s
Checks:                 1 / 1, 100%
Deleted:                1 (files), 0 (dirs)
Transferred:           45 / 45, 100%
Elapsed time:         9.7s
2022/08/24 12:14:14 INFO  : 
Transferred:      321.338 MiB / 321.338 MiB, 100%, 35.582 MiB/s, ETA 0s
Checks:                 1 / 1, 100%
Deleted:                1 (files), 0 (dirs)
Transferred:           45 / 45, 100%
Elapsed time:         9.7s

Here is the rclone.conf entries for URLs and then mariner10:

[asc_s3]
type = s3
provider = AWS
region = us-west-2
location_constraint = us-west-2

[esa]
type = http
url = http://spiftp.esac.esa.int/

[naif]
type = http
url = http://naif.jpl.nasa.gov/

[jaxa]
type = http
url = http://www.darts.isas.jaxa.jp/

...

[mariner10_naifKernels]
type = alias
remote = naif:/pub/naif/M10/kernels

[mariner10_usgs]
type = alias
remote = asc_s3:asc-isisdata/usgs_data/mariner10

Note the directory on the NAIF public kernel server, https://naif.jpl.nasa.gov/pub/naif/M10/kernels/, does exist. Maybe its failing due to the use of http rather than https?

Finally, in the verbose output it would nice to log the URL/location of the download.

Kelvinrr commented 2 years ago

@KrisBecker on the data being deleted that's definitely a bug. The public data should be much larger than the usgs data which is why it ends up being really small at the end when one gets clobbered for the other. The intent would be to do sync a union of the two. It should be an easy fix.

You can add two v's in order to get debug output with the URLs:

./downloadIsisData  --config rclone.conf -vv mariner10 efs_prod/isis_data/

.
.
.

2022/08/25 15:43:35 DEBUG : Creating backend with remote "naif:/pub/naif/M10/kernels"
2022/08/25 15:43:35 DEBUG : Assuming path is a file as HEAD response is redirect (302 Found) to a path that does not end with '/': https://naif.jpl.nasa.gov/pub/naif/M10/kernels

I'll add a help string to include the verbose levels.

As far as some of the mariner stuff not downloading, looking at the output it looks like it did copy the files over? It's only 1.8mb of data so if it's already there the second pass might now download anything. I think right now there are some redundancy between the usgs data source and the public source that needs to be cleaned out. So if the public source is downloaded first, you don't really download anything on the second pass of downloading from usgs.

KrisBecker commented 2 years ago

@Kelvinrr please do not do a union of the NAIF SPICE archive and ISIS data. This creates an absolutely unnecessary burden on users. If anything, an intersection of the two would be sufficient. But even downloading the SPICE archive at this point does not add anything that is not already in the ISIS data. The kernel maintenance scripts are designed to download from the SPICE archive any kernel version updates but only the required ones.

And you do not want to just update any of the kernels without considering the impact of that action. For example, you would not want to simply install a PCK or IK kernel without evaluating the impact on geometry.

As I pointed out, the OREX archive is huge - nearly 400GB. It is unreasonable to burden users with accommodating the disk space required for the entire archive when only 9GB is required to support the mission in the ISIS environment.

Kelvinrr commented 2 years ago

@KrisBecker I think downloading everything and using the kerneldb's to control what ISIS actually uses when spiceiniting and having the rest of the kernels available is fine as test failures right now seem mostly from things missing not things being there that shouldn't. I see your point about unused kernels, so I don't think it's unreasonable to create whitelists for kernels based on the kerneledb files to not bloat things more than they need to be or disable non usgs sources for active missions.

I can disable public orex and only use the usgs source.

Edit: After thinking about it more, my first question would be what are the 400GBs or OREX data doing in a these stores if most of them are not useful? Is there a way OREX can partition these kernels such that they are easier to download only what is needed without us maintaining some kind of whitelist instead?

KrisBecker commented 2 years ago

I recommend you disable the SPICE archive download for all missions. Then add a parameter, e.g., --archive-only, that will explicitly invoke that download. I can see this option may be potentially useful to help determine kernels that apply in the ISIS environment for configuration purposes. Otherwise, there is not always direct correspondence with the SPICE archive and ISIS data directories.

For example, there are many missions, e.g., one being rosetta, that have a tspk directory which contain target position/orientation kernels. The tspk (and other ISIS data) directories typically do not exist in the SPICE archive so downloading/installing the SPICE archive is insufficient without explicit intervention (typically by the ISIS kernel maintenance script) to complete a valid mission ISIS SPICE kernel installation.

KrisBecker commented 2 years ago

Is there a way OREX can partition these kernels such that they are easier to download only what is needed without us maintaining some kind of whitelist instead?

No - generally applicable to all public SPICE kernel archives produced by missions (not just instrument teams).

jlaura commented 2 years ago

To be really explicit here: USGS is no longer able to be the repository of record for these SPICE kernels for the community. We will continue to be the repository of record for those kernels which we produce (e.g., smithed THEMIS kernels). For all other kernels, we will be providing a mechanism to download kernels from a publicly available source. If that source is making 400GB of kernels available to the community with no mechanism to provide only those kernels which the community finds most useful (in whatever context, whether ISIS or otherwise) a multitude of options exist including, but not limited to:

  1. Users download the 400GB of kernels.
  2. Users contact the mission / instrument team / funding agency and request data be made available in a usable format. Any and all of those entities can chose whatever path they find most appropriate to address user requests.
  3. Users make use of the SPICE web service that includes kernels maintained by the USGS.

As stated elsewhere, starting in early November, we are no longer serving a curated subset of kernels due to a number of policy and data release requirements. We are providing the download scripts as a means for users to access kernels from their repositories of record. We will continue to serve supplemental elements that are needed, e.g., IAKs as they are a component of ISIS and not a product generated by a team.

KrisBecker commented 2 years ago

Can you clarify what the contents of the USGS kernel sources will be after USGS discontinues hosting kernels?

Comments in this thread and #5026 indicate that at the end of November 2022, the complete content of all ISISDATA areas will be scrubbed of all SPICE kernels. It appears that the only files that will exist on the AWS servers will be the configuration files and all kernels will come from archive or mission sources.

I am asking this question because current testing indicates the presence of SPICE kernels in the USGS AWS sources. Will they still exist after USGS discontinues hosting kernels?

Kelvinrr commented 2 years ago

@KrisBecker The short answer is that the USGS stores will only have things that are not in other hosted areas (naif, ESA, etc.). So it'll still have kernels that we publish (e.g. smithed kernels) and stuff that's difficult to get elsewhere that it's easier to just host (some random kernels and data here and there).

KrisBecker commented 1 year ago

I have just completed the download of all ISISDATA. Can you confirm the total size is 1.9TB?

KrisBecker commented 1 year ago

I have a preliminary version of an rclone filter for ISISDATA that greatly reduces the current download size. I will stress this is preliminary and should be used with caution, particularly in an active mission or research situation.

That said, it would be good to get some testing if there is an interest to incorporate this as part of the USGS instruction set.

The custom reclone filter file (via Gist) isisdata_rclone_filter_from.lis can be provided as the optional --filter-from argument to rclone.

The reclone filtering documentaton is helpful.

Here are some examples taken from the file isisdata_rclone_filter_from.lis:

## Lists results of filter. Exclude the --filter-from arg for current
##  download comparisons.
rclone ls --config rclone.conf  messenger_naifKernels:
rclone ls --config rclone.conf  messenger_usgs:

rclone ls --config rclone.conf  --filter-from isisdata_rclone_filter_from.lis  messenger_naifKernels:
rclone ls --config rclone.conf  --filter-from isisdata_rclone_filter_from.lis messenger_usgs: | grep spk
rclone ls --config rclone.conf  --filter-from isisdata_rclone_filter_from.lis messenger_naifKernels: | grep spk

##  With downloadIsisData, all data, current state.
mkdir isisdatafull
./downloadIsisData all $PWD/isisdatafull --config=rclone.conf  -vv --log-file=isisdata_unfiltered.log

# Filtered with the contents of this file provided in --filter-from arg.
mkdir isisdatafiltered
./downloadIsisData all $PWD/isisdatafiltered --config=rclone.conf  --filter-from=isisdata_rclone_filter_from.lis -vv --log-file=isisdata_filtered.log
github-actions[bot] commented 1 year ago

Thank you for your contribution!

Unfortunately, this issue hasn't received much attention lately, so it is labeled as 'stale.'

If no additional action is taken, this issue will be automatically closed in 180 days.

antonhibl commented 10 months ago

The actual filtering(blacklisting/whitelisting) of the data area being downloaded is now handled by the --include/--exclude and --filter flags and that aspect of this discussion seems to be resolved. the size of the data area looks to be correct as well from my testing. I would direct future conversation on filtering issues towards issues dealing more directly with that feature, i.e. https://github.com/DOI-USGS/ISIS3/issues/5264(deals with kernels being excluded and seems to be more a misunderstanding on how the rclone filtering actually works rather than an actual issue with downloadIsisData).