DOI-USGS / ISIS3

Integrated Software for Imagers and Spectrometers v3. ISIS3 is a digital image processing software package to manipulate imagery collected by current and past NASA and International planetary missions.
https://isis.astrogeology.usgs.gov
Other
195 stars 166 forks source link

Move the application data into the git repo #3727

Closed scsides closed 4 years ago

scsides commented 4 years ago

Description
The ISIS application specific data (e.g., PDS3 to ISIS cube translation) are currently mixed in with the general data (e.g., Cassini/kernels/ck/*). Move the application specific data into the git repository and out of the ISIS data area.

Addition: The changes proposed to have many, not all, text files under revision control include:

Example

KrisBecker commented 4 years ago

Originally in #3728.

While I agree this is technically not API breaking as defined by the current policy, it potentially breaks a lot of stuff. It appears this type of change should be covered by policy. Perhaps a period of deprecation could be specified where both environment variables, ISIS3DATA and ISISDATA, refer to the same value to ease the transition and provide an environment in which users can assess the impact in their environments. It might be a good idea to treat this version as a "breaking change" release.

Here are but a few potential issues I have identified:

ISIS3DATA is hardcode in the ISIS appdata branch

Most of these occurrences are in unit tests, some are in main applications and others are in comments and should probably be changed for consistency. However, in some cases, the change would not be backward compatible (is this a breaking change?). Its likely these issues have already been identified, but here is a list of the occurrences in the appdata branch that directly use $ISIS3DATA:

(isis3/appdata)$ find ./isis/src -name '*.cpp' -exec grep -A1 -C0 -H ISIS3DATA {} \;
./isis/src/voyager/apps/voy2isis/main.cpp:  QString lsk = "$ISIS3DATA/base/kernels/lsk/naif????.tls";
./isis/src/voyager/apps/voy2isis/main.cpp:  QString sclk = "$ISIS3DATA/voyager";
./isis/src/tgo/objs/TgoCassisCamera/TgoCassisDistortionMap.cpp:    // of the CCD. (See $ISIS3DATA/tgo/assets/distortion/DistortionModelA3CorrRoots.jpg).
./isis/src/tgo/objs/TgoCassisCamera/TgoCassisDistortionMap.cpp:    // of the CCD. (See $ISIS3DATA/tgo/assets/distortion/DistortionModelA3DistRoots.jpg).
./isis/src/rosetta/objs/RosettaVirtisCamera/unitTest.cpp:    Cube visCube("$ISIS3DATA/rosetta/testData/V1_00388238556.cub", "r");
./isis/src/rosetta/objs/RosettaVirtisCamera/unitTest.cpp:    Cube irCube("$ISIS3DATA/rosetta/testData/I1_00382172310.cub", "r");
./isis/src/rosetta/apps/rosvirtis2isis/main.cpp:    QString sclk = "$ISIS3DATA/rosetta/kernels/sclk/ROS_??????_STEP.TSC"; 
./isis/src/rosetta/apps/rosvirtis2isis/main.cpp:    QString lsk  = "$ISIS3DATA/base/kernels/lsk/naif????.tls"; 
./isis/src/viking/apps/viknobutter/main.cpp:  QString maskParameter = "$ISIS3DATA/viking" + toString(spn) +
./isis/src/system/apps/dempack/main.cpp:      QString izpackFile = fileList[j].replace("ISIS3DATA", "{ENV[ISIS3DATA]}");
./isis/src/messenger/objs/MdisCamera/MdisCamera.cpp:    // in $ISIS3DATA/messenger/kernels/iak/mdisAddendum???.ti for the
./isis/src/messenger/objs/MdisCamera/MdisCamera.cpp:    // See $ISIS3DATA/messenger/kernels/iak/mdisAddendumXXX.ti or possibly
./isis/src/messenger/objs/MdisCamera/MdisCamera.cpp:    // $ISIS3DATA/messenger/kernels/ik/msgr_mdis_vXXX.ti for the *_OD_K
./isis/src/socet/apps/socetlinescankeywords/main.cpp:  //      of 0.007 mm gotten from $ISIS3DATA/mgs/kernels/ik/moc20.ti.  (The
./isis/src/newhorizons/apps/mvic2isis/main.cpp:  QString lsk = "$ISIS3DATA/base/kernels/lsk/naif????.tls";
./isis/src/newhorizons/apps/mvic2isis/main.cpp:  QString sclk = "$ISIS3DATA/newhorizons/kernels/sclk/new_horizons_???.tsc";
./isis/src/control/apps/cnetstats/cnetstats.cpp:        Pvl pvlTemplate("$ISIS3DATA/base/templates/cnetstats/cnetstats.def");
./isis/src/control/apps/cnetref/main.cpp:        pvlTemplate = Pvl("$ISIS3DATA/base/templates/cnetref/cnetref_operator.def");
./isis/src/control/apps/cnetref/main.cpp:        pvlTemplate = Pvl("$ISIS3DATA/base/templates/cnetref/cnetref_nooperator.def");
./isis/src/control/apps/cnetedit/main.cpp:      Pvl pvlTemplate("$ISIS3DATA/base/templates/cnet_validmeasure/validmeasure.def");
./isis/src/base/objs/Pipeline/unitTest.cpp:  ui.PutFileName("FROM",  "$ISIS3DATA/odyssey/testData/I00831002RDR.even.cub");
./isis/src/base/objs/Pipeline/unitTest.cpp:  ui.PutFileName("FROM2", "$ISIS3DATA/odyssey/testData/I00831002RDR.odd.cub");
./isis/src/base/objs/Pipeline/unitTest.cpp:  ui.PutAsString("FROM", "$ISIS3DATA/odyssey/testData/I00831002RDR.cub");
./isis/src/base/objs/Pipeline/unitTest.cpp:  pc1.SetInputFile(FileName("$ISIS3DATA/mro/testData/PSP_001446_1790_BG12_0.cub"));
./isis/src/base/objs/Pipeline/unitTest.cpp:  pc2.SetInputFile(FileName("$ISIS3DATA/mro/testData/PSP_001446_1790_BG12_0.cub"));
./isis/src/base/objs/EmbreeTargetManager/unitTest.cpp:    QString copyDSKFile = "$ISIS3DATA/base/testData/hay_a_amica_5_itokawashape_v1_0_64q.bds";
./isis/src/base/objs/ShapeModelFactory/unitTest.cpp:    QString inputFile = "$ISIS3DATA/mgs/testData/ab102401.cub";
./isis/src/base/objs/ShapeModelFactory/unitTest.cpp:    inputFile = "$ISIS3DATA/galileo/testData/1213r.cub";
./isis/src/base/objs/ShapeModelFactory/unitTest.cpp:    // inputFile = "$ISIS3DATA/;
./isis/src/base/objs/Sensor/unitTest.cpp:                        "$ISIS3DATA/base/dems/molaMarsPlanetaryRadius0004.cub");
./isis/src/base/objs/Sensor/unitTest.cpp:            "$ISIS3DATA/base/dems/molaMarsPlanetaryRadius0004.cub" << endl;
./isis/src/base/objs/Sensor/unitTest.cpp:            "ShapeModel=$ISIS3DATA/base/dems/molaMarsPlanetaryRadius0004.cub"
./isis/src/base/objs/Sensor/unitTest.cpp:            "ShapeModel=$ISIS3DATA/base/dems/molaMarsPlanetaryRadius0004.cub"
./isis/src/base/objs/Sensor/unitTest.cpp:            "ShapeModel=$ISIS3DATA/base/dems/molaMarsPlanetaryRadius0004.cub"
./isis/src/base/objs/Sensor/unitTest.cpp:            "ShapeModel=$ISIS3DATA/base/dems/molaMarsPlanetaryRadius0004.cub"
./isis/src/base/objs/Gruen/Gruen.cpp:   * @brief Load default Gruen parameter file in $ISIS3DATA/base/templates
./isis/src/base/objs/Gui/GuiFilenameParameter.cpp:    QString file = FileName("$ISIS3DATA/base/icons/view_tree.png").expanded();
./isis/src/base/objs/Application/Application.cpp:    QString env6 = "printenv ISIS3DATA >> " + tempFile;

Example of ISIS3DATA in Code

Here is an example of ISIS3DATA in the API. One could still argue this is not API breaking, but its clear the first example is not backward compatible. The second one can be made so by replacing the occurrences of $ISIS3DATA/ with a $.

By the way, I am not a big fan of renaming all the main applications to main.cpp because in some GUI IDEs, its difficult to tell which main application you are editing. I am a SlickEdit user. Which main.cpp tab goes with which source file? Visual aids require a significant amount of screen real estate. Likely this makes things easier to turn ISIS mains into callable functions, or makes things easier in cmake, but it makes some development tasks that were easy and straightforward much more difficult/inconvenient/frustrating/annoying.

Screen Shot 2020-03-04 at 10 48 16 PM

Kernel File Management Scripts

Eventually, scripts that manage NAIF kernels in the ISIS environment refer to the install directory. There are many mission kernel download/install scripts that use the $ISIS3DATA environment variable to navigate to the appropriate data directory and generate the kernel DB file, typically with a ./makedb command. These type of scripting scenarios will present challenges as they will require modifications. They will not be backward compatible without additional consideration of explicit support for both variables. I know USGS maintains many of these types of scripts that will require modifications. Our kernel maintenance scripts will require these modifications as well.

Pipeline Processing Scripts

There is likely significant development of ISIS processing scripts that makes use of $ISIS3DATA that will have to change. While there are ways (i.e., IsisPreferences) to specify ISIS data directories, its likely many are not using this overloaded form (e.g., $rosetta/kernels/dsk/ROS_CG_M004_OSPGDLR_U_V1.bds) of file specification in scripts. Unfortunately, this form of ISIS file referencing does not translate into the Unix Shell without use of the $ISIS3DATA environment variable. Changes will still be required that are not readily backward compatible. This will initially make regression testing of previous ISIS versions challenging.

Legacy Use

Note that the ISIS rsync servers has an isis3data module that serve up the ancillary data for ISIS3. This will soon be obsolete, but what happens to ISIS2 on the servers? Note that ISIS2 is no longer maintained, but USGS/Astro is still providing the system through the the rsync server.

And I'm gonna say it - ISIS2 uses the $ISISDATA environment variable. That was the main reason for creating the ISIS3DATA environment variable in ISIS3 - to avoid conflict. This issue will likely serve as the opportunity to wipe ISIS2 off the internet. Its probably time for it, but it needs to be addressed.

IsisPreferences

Perhaps not many users do this, but I always have a copy of IsisPreferences in $HOME/.Isis. (MacTip: I always set GuiHelpBrowser = open so that your user preferred browser is used.) And while this is not recommended practice, I retain much of the contents from the original $ISISROOT/IsisPreferences file including the mission directory specifications that use the $ISIS3DATA environment variable. This will lead to breakage and would likely be difficult for users to determine why they are getting missing file errors.

Summary

This is going to be a potentially painful change for many users, particularly for users who have a long history and substantial investment in ISIS. While the current API definition of "breaking change" may not apply here, it is another form of breakage that should be covered by policy. To ease the transition, it would be nice to provide a period of depreciation of the ISIS3DATA environment variable.

I am not saying this should not be done. Changes like this are necessary/inevitable/painful, but I think we can and should do more to ease the transition.

Thank you for your consideration...

lwellerastro commented 4 years ago

Am I correct in thinking that when this change is implemented, a user would potentially have to re-spice all of their data to update path references in the Kernels Group?

I am not explicitly seeing $ISIS3DATA in path names (I believe that went away some time ago), but in looking at some labels for data across just two of the many projects I am currently working on, I do see /usgs/cpkgs/isis3/data/ in the Kernels Group.

Although many data sets would be straight-forward in re-spicing, I have two projects where images underwent special processing including running deltack. It would be quite painful to deal with those as well as costly time sinks.

jessemapel commented 4 years ago

Perhaps a period of deprecation could be specified where both environment variables, ISIS3DATA and ISISDATA, refer to the same value to ease the transition and provide an environment in which users can assess the impact in their environments.

I think this is a good path forward here. We should have some transition period. Something like a getDataDirectoryPath function that could return ISISDATA if it's set and ISIS3DATA otherwise. Then, after a set amount of time remove the ISIS3DATA part.

Am I correct in thinking that when this change is implemented, a user would potentially have to re-spice all of their data to update path references in the Kernels Group?

The only situation where you need to re-spiceinit your data is if you have $ISIS3DATA in your kernel paths and the data from those kernels is not attached. This is likely an exceedingly small amount of data.

I am not explicitly seeing $ISIS3DATA in path names (I believe that went away some time ago), but in looking at some labels for data across just two of the many projects I am currently working on, I do see /usgs/cpkgs/isis3/data/ in the Kernels Group.

This data would not need to be re-spiceinit'd

rbeyer commented 4 years ago

Perhaps a period of deprecation could be specified where both environment variables, ISIS3DATA and ISISDATA, refer to the same value to ease the transition and provide an environment in which users can assess the impact in their environments.

I think this is a good path forward here. We should have some transition period. Something like a getDataDirectoryPath function that could return ISISDATA if it's set and ISIS3DATA otherwise. Then, after a set amount of time remove the ISIS3DATA part.

This seems like a robust solution: bottleneck the need to know the actual name of the environment variable through a function, and then the function can check for the existence of both.

Of course, this leads to the question of why not always have it this way? Why should the explicit name of an environment variable be sprinkled throughout the codebase? Seems like it should always just be in one place.

jessemapel commented 4 years ago

@rbeyer It is already partially behind an interface. There is a function in the Process classes that gets the mission specific data directories without the caller passing $ISIS3DATA. There are a handful of places that don't use this though.

scsides commented 4 years ago

To expand on what @jessemapel said about re-spiceiniting. If the kernels group of a spiceinit-ed cube has any $ISIS3DATA references, then after the change, re-spiceinit-ing would change those to "$ISISDATA". No real change, except for the environment variable. Both areas can be available at the same time.

Note: The vast majority of the ISIS3DATA and ISISDATA files are identical (i.e., all the SPICE, calibration, DEMs). The files being moved to the source area are mostly translation, PVL formatting, icons and other text files that need to be under revision control alongside the source code that uses them. This way they can be changed if necessary without affecting other versions of ISIS. This was difficult in previous versions.

One case where this could be an issue is if the attached SPICE tables were modified through a jigsaw run and ckwriter or spkwriter were then used and the ISIS3DATA environment variable was not set. These programs re-open the SPICE files to get some additional information before converting the table information back into SPICE files.

jessemapel commented 4 years ago

One case where this could be an issue is if the attached SPICE tables were modified through a jigsaw run and ckwriter or spkwriter were then used and the ISIS3DATA environment variable was not set. These programs re-open the SPICE files to get some additional information before converting the table information back into SPICE files.

It should also be emphasized that you can work around this by setting the ISIS3DATA environment variable to the value of your ISISDATA environment variable before running the applications

jessemapel commented 4 years ago

This is in response to some discussion in a separate PR: https://github.com/USGS-Astrogeology/ISIS3/pull/3742#issuecomment-596805576

@KrisBecker

How is it possible to isolate translation files in the source tree and not ship them prematurely?

This functionality is being dropped because the translation files are tightly connected to the logic of the software. Publicly distributing one and privately distributing the other is of no use to the public. If someone does not want to publicly distributing the translation files, then they should not publicly distribute the software by submitting it to the public repo.

@scsides Will the translation files for O-Rex and EIS be added to the repo?

KrisBecker commented 4 years ago

It is a service to mission teams where, in the past, the USGS provided software support. The team may prefer to hold back translation files and kernels so that they have appropriate time to publish their results before these files are released to the scientific community.

scsides commented 4 years ago

@jessemapel, Tightly coupled is putting it mildly. Many programs/classes will not work with even minor changes to the translations used in the ingest programs. They are code, just not C++. To answer your question, yes they should be moved to the code repo. EDIT: We should ask.

scsides commented 4 years ago

Just to be clear: The kernels are fully controlled by the mission/instrument team. They usually have a release schedule dictated by something like a PDS archive schedule. We are not proposing to move SPICE kernels from the data area, only files directly associated with the code that needs revision control.

jlaura commented 4 years ago

More opinion here concerning this thread.

First, I want to note that this issue has been wonderful in highlighting process improvements, particularly in the update cadence for RFCs. This issue has also highlighted just how important engagement is when we have formally adopted a lazy consensus model.

Second, this is not an API breaking change based on how the project is defining the API. It may be that the project should increase the scope of what is considered API breaking. If that is the case, then some contributor needs to open an RFC that articulates what the scope of the API is and what improvements will be effected by changing the current scope.

Based on the above, the downstream changes caused by altering the environment variable are out of scope for concerns we would explicitly address. That seems pretty draconian to simply say 'too bad'. The alternative presented:

It should also be emphasized that you can work around this by setting the ISIS3DATA environment variable to the value of your ISISDATA environment variable before running the applications

solves this issue.

Finally, I am concerned that new issues are being added into this discussion that are out of scope. For example, any discussion about the changes to application filenames (that are not part of the API) should be discussed either on the associated PR (lazy consensus approach; best for non-sweeping changes) or in an issue/question.

I am seeing neither concerns being raised which are outweighed by the benefits of the proposed move nor concerns being raised with associated solutions that effect a similar level of benefit. Therefore, I am in full support of merging these changes prior to the 4.1.0_RC assuming that all working being done is completed and meets all our our current process standards.

KrisBecker commented 4 years ago

@jlaura...

Based on the above, the downstream changes caused by altering the environment variable are out of scope for concerns we would explicitly address. That seems pretty draconian to simply say 'too bad'. The alternative presented:

It should also be emphasized that you can work around this by setting the ISIS3DATA environment variable to the value of your ISISDATA environment variable before running the applications

solves this issue.

I have quickly reread RFC2, particularly the Versioning section which states:

While ISIS3 remains a tightly coupled collection of applications and an API this means that breaking changes to either API signatures (methods, functions) or apps (spiceinit, pds2isis, jigsaw, etc.) would constitute a breakage.

Can you explain how this change does not meet the API breaking definition in RFC2 when application defaults will no longer find default parameter files that were moved from ISIS3DATA/ISISDATA to ISISROOT? I don't see how ISIS3DATA=ISISDATA solves this issue.

As an example, see the MAP parameter in cam2map in the appdata branch. Its default for this parameter is <default><item>$base/templates/maps/sinusoidal.map</item></default>. This overloaded reference selects this file out of the the ISIS3DATA/ISISDATA environment, but it will no longer be there - it is now moved to the source tree requiring this reference to be changed to $ISISROOT/appdata/maps/sinusoidal.map. Doesn't this demonstrate that any application that has these types of references, such as the MAP parameter in map2map, or DEFFILE in coreg, are also broken? Does this meet the definition of API breaking?

Is it also true that pipeline processing scripts that reference map or registration templates from the long established $ISIS3DATA/base/templates (or $base/templates) path are broken as well? Does this meet the definition of API breaking?

The USGS has received complaints from mission teams in the past precisely because changes such as this result in broken pipelines. What concern would be appropriate to reconsider these changes?

Now you could define a new $appdata specification in IsisPreferences to help make the change less vulnerable to environment changes in the future, but this would not fix broken scripts.

Thanks...

jlaura commented 4 years ago

@KrisBecker It does not seem like the arguments that you are making right now are in good faith with the goal of moving the project forward. I understand the desire to keep things the way they are and for a mission team that freezes their version at the start of a mission and then, rightfully so chooses not to update because they need to ensure continuity. I am failing to empathize with calls for deprecation warnings and to not make this change.

I am not aware of any active missions that are making updates to their systems on the same cadence that we are releasing. In fact, I do not know of many that have made the transition to the conda delivery system yet (yourself included) because of strict version freezing requirements in pipelines.

Therefore, I still have not seen a compelling reason to keep these changes from occurring. In fact, these changes will provide a benefit to users as we are getting files that are source (or proxies such as translation tables, icons, etc.) under version control. This further fulfills a primary project goal to reduce the burden for external developers to make changes in the code base with as little friction as possible.

jessemapel commented 4 years ago

Doesn't this demonstrate that any application that has these types of references, such as the MAP parameter in map2map, or DEFFILE in coreg, are also broken? Does this meet the definition of API breaking?

I think this is too strict a reading of the RFC. The values of the default parameters have changed but they still point to the same file. The intent behind defining the application defaults as part of the API was to ensure that commands lines will work the same within a major version. Even though the default value changed, it has no impact on how the application runs for users.

Is it also true that pipeline processing scripts that reference map or registration templates from the long established $ISIS3DATA/base/templates (or $base/templates) path are broken as well? Does this meet the definition of API breaking?

I do not see that as API breaking because the environment variables are not a part of the API. I also think that this type of disruption is unavoidable if we are ever going to move off of ISIS3 either with the major version number or with the name of the package (see #3683 and the SAT review). We have already proposed a way that disruption from this change can be mitigated by setting ISIS3DATA=ISISDATA. If that is insufficient then I'm open to other options, but I don't think using ISIS3DATA, ISIS3TESTDATA, and ISISROOT forever is an option.

Something that is not being addressed here is how these changes benefit the community. These files are tightly coupled to the software logic, and as @scsides they are really a part of the code, and often times bug fixes require updating these files. We have had several issues where people wanted to contribute fixes like this, but couldn't because these files are not available for public changes (#3698 and #2669 come to mind).

This also helps dealing with using old data with new software or visa versa. We get issues and questions related to these problems quite often.

jlaura commented 4 years ago

Just a note that RFC 4 has been updated. We are on track to have this merged prior to the 4.1.0_RC (April 1).

Since version >= 4.1 will use the new data area we are planning to host both the old and new data areas for 6+ months (very likely longer than 6 months, I am just being conservative in my initial estimate). The 6 month window is 3 months greater than the 3.11.0 support window.

AndrewAnnex commented 4 years ago

just to continue some of the conversation about gitlfs in the RFC, has "git-annex" https://git-annex.branchable.com/ been discussed?

here is a good blog post about it vs git lfs: https://lwn.net/Articles/774125/

I have no connection to the project...

Another important consideration before hosting the data in S3 are the bandwidth costs, users should not have to pay for transfer... Also check out the tool rclone https://rclone.org/ for replacing the rsync workflow to whatever is decided upon.

jessemapel commented 4 years ago

Another important consideration before hosting the data in S3 are the bandwidth costs, users should not have to pay for transfer... Also check out the tool rclone https://rclone.org/ for replacing the rsync workflow to whatever is decided upon.

I definitely agree users shouldn't be paying for the transfer costs. I also believe that rclone is being looked into for working with S3.

AndrewAnnex commented 4 years ago

regarding the performance issues with git lfs in the RFC, was that from a test of a prospective user attempting to download the test data using git lfs or just local performance? It could make sense for the usgs folks to user git lfs behind the scenes to manage the data area still and to cut releases of the data that are then synced to buckets or whatever, so even if its slow it could still be useful

jlaura commented 4 years ago

rclone is being looked at.

While I agree that users should not have to pay for data transfer costs, this is not something that we could guarantee one way or another. See the recent NASA OIG report on earth data cloud storage. That said, the current solution is not something that we can continue to pay.

The LFS tests were internal and external. Since we are working to significantly reduce test data volumes and because the data area changes infrequently, we should be good to go using S3 without the need for another layer in between.

jessemapel commented 4 years ago

the data area changes infrequently

I'm not sure what you mean by this. We pull kernel updates from active missions all the time

jlaura commented 4 years ago

I am meaning backwards compatibility related changes, e.g., the bsp change that happened in the last 6 months.

AndrewAnnex commented 4 years ago

just how of curiosity, how much more work will it be to expand the web based spice data and is there a issue tracking those issues? And do the new tech policies have an impact on that service?

jlaura commented 4 years ago

We have a potential project in 2021 to work on migrating additional portions of the code to make use of spice data over the web. This is proposed work. PRs are always solicited!

jessemapel commented 4 years ago

how much more work will it be to expand the web based spice data

We're working towards this goal. Currently the spice web service is just a script on a server that runs spiceinit for you. Next year we are looking to cut down on the number of applications that load kernels and have them read the spice data attached to cubes when available. Then, in the future we can explore changing the spice web service to do more than just run spiceinit.

Improving the spice web service doesn't solve all of these problems, though. There are many non-spice kernel files that are needed such as translation files, calibration files, GUI icons, templates, test files, etc. that we wouldn't want to serve over the web.