CBICA / CaPTk

Cancer Imaging Phenomics Toolkit (CaPTk) is a software platform to perform image analysis and predictive modeling tasks. Documentation: https://cbica.github.io/CaPTk
https://www.cbica.upenn.edu/captk
Other
180 stars 64 forks source link

Decouple the pre-compiled binaries from CaPTk installation and packaging #883

Closed PhucNgo1711 closed 4 years ago

PhucNgo1711 commented 4 years ago

Is your feature request related to a problem? Please describe. Currently, the complexity of the installer mechanisms is too much, which makes things far likely to break and makes maintenance difficult. Having precompiled apps packaged all together at one place also creates space issues for the Azure builds.

Describe the solution you'd like Decoupling all the precompiled binaries (i.e., DeepMedic, ITK-SNAP, LIBRA, etc.) so that the size of the installer reduces and to give the user the option to only get the components they require.

sarthakpati commented 4 years ago

White paper (read only): https://1drv.ms/w/s!AgvRZZtXCbbOnKBuKuZGamU4Kv-Ikw?e=wpEQq0

PhucNgo1711 commented 4 years ago

Not sure if I should take the discussion here in the comment section but here it is:

So far I have added this new tab here (for testing), and whatever you click on should trigger a basic download. Download links are added to the links.yaml file, just like what we have for sample data download. Capture

Now here's where we need to make a decision. Assuming an application has not been downloaded yet. When the user click on that application, a pop up will say something like "This application needs to be downloaded", then the user proceed with the download.

Option 1

A pop up appears letting the user specify the download location. Once the user has confirmed (by clicking Ok), the download starts and CaPTk saves that directory. Will need something to store these paths efficiently.

Option 2

Similar to how IDEs and Text Editors manage packages, when the user selects an app to download, they all go to one pre-defined location. The user does not necessary know about this location. CaPTk will always point there.

sarthakpati commented 4 years ago

Personally, I am fine with either. @MarkBergman-cbica and @dboun wanted something better defined. Perhaps the updated location given in Point 4 of "desired functionality" in the white paper could be possible?

MarkBergman-cbica commented 4 years ago

I'll take what's behind door #2, please.

AlexanderGetka-cbica commented 4 years ago

The way we store the applications will also inform how we check if an application is installed and how we could check its version (if we decide that we should do that, which I personally think is a good idea).

If we choose option 1, then users gain a (small) amount of flexibility with their directory structure (and could possibly reuse their install location when doing work outside of CaPTk... which seems to me not very useful and runs counter to CaPTk's role as a many-in-one solution).

If we choose option 2, things become a little easier to keep track of on our end and things won't break inadvertently due to changes on the external file system. Contrary to the notion that users won't know where their applications are, having them all in one central CaPTk-related location will mean that we will always know where CaPTk is pulling from (helping to avoid confusing issues with redundant or incompatible installations).

Option 2 is probably the way to go for the sake of consistent user experience.

In either case, we should store the data (both applications and preferences/persistent directory associations) in user-space so that it doesn't require admin rights on multi-user systems/clusters.

dboun commented 4 years ago

Hi all,

I have added "## Configuration file(s)" and "## Extension manager directories", please review and let me know if you have any questions

PhucNgo1711 commented 4 years ago

Please let me know that it would be easier to create like a Google Docs for this, but since this topic was discussed earlier, I'd figure I'll continue the discussion here.

This is the foundation that I have implemented

Click on Libra from dropdown menu

if $USER_HOME/.CaPTk/$version/apps/libra not exist
   download libra.zip to $USER_HOME/.CaPTk/$version/apps
   extract libra.zip
   now $USER_HOME/.CaPTk/$version/apps/libra exists
   done
else
   run $USER_HOME/.CaPTk/$version/apps/libra/bin/libra (on Linux for example)

From today's discussion, I've added a few things

Click on Libra from dropdown menu

if $USER_HOME/.CaPTk/$version/apps/libra not exist

   if $USER_HOME/.CaPTk/$version/apps/libra.zip not exist  // no download happened before   
   OR "download done" is not marked  // download started but failed
      download libra.zip to $USER_HOME/.CaPTk/$version/apps
      (1) mark download done - write to log

   (2) mark extraction starts - write to log
   (3) lock bin/libra
    extract libra.zip
   (3) mark extraction done - write to file
   (4) unlock bin/libra

   now $USER_HOME/.CaPTk/$version/apps/libra exists
   done
else
   if "extraction done" is not marked  // no need to worry about download here anymore
      print "Installation in progress"
   else 
      run $USER_HOME/.CaPTk/$version/apps/libra/bin/libra (on Linux for example)
sarthakpati commented 4 years ago

Good idea to continue the discussion here.

My idea was something along the following direction:

if not exist( ${USER_HOME}/.CaPTk/${version}/apps/libra/downloading.txt )

    if exist( ${USER_HOME}/.CaPTk/${version}/apps/libra/ready.txt )
        # everything is as expected and the user can run LIBRA
        system_call( "${USER_HOME}/.CaPTk/${version}/apps/libra/bin/libra" )

    else
        if not exist( ${USER_HOME}/.CaPTk/${version}/apps/libra/downloadFinished.txt )    
            # re-download; write ${USER_HOME}/.CaPTk/${version}/apps/libra/downloading.txt

        else

            if( checksum == reference_checksum )
                delete ${USER_HOME}/.CaPTk/${version}/apps/libra/downloading.txt
                write ${USER_HOME}/.CaPTk/${version}/apps/libra/downloadFinished.txt

            else                
                delete downloaded zip and call function again, which would re-start download

            # at this point, download is done and extraction can proceed

            if( unzip downloaded_file exists with 0)
                delete ${USER_HOME}/.CaPTk/${version}/apps/libra/downloadFinished.txt
                write ${USER_HOME}/.CaPTk/${version}/apps/libra/ready.txt

                # at this point, the application is ready to be launched, 
                # so simply call this function again and it will go to the first if-loop 
                # and call the application as expected;
                # we will have a single point of entry for application
            else
                # extraction failed; re-try
    else
        # another process is downloading the same application, don't do anything

EDIT: added download-lock logic

dboun commented 4 years ago

I think that

@PhucNgo1711 's algorithm can lead to "hard locks", if the computer powers off suddenly during the locks, as it doesn't feature a way to forcefully unlock.

@sarthakpati 's algorithm can raise issues if two processes download the same thing (the first process' downloading will be overwritten and subsequently crash or worse during unzipping), as it doesn't feature locks.

I'm struggling to think of an seamless solution to both that doesn't involve the user. I think a solution could be to:

Hope this helps

MarkBergman-cbica commented 4 years ago

locking.txt

MarkBergman-cbica commented 4 years ago


                        user runs application from GUI or CLI
                                 |                               
                                 |                               
                         --[Y]--> create log entry (ie., application FooBar launched while lock file exists)
                                 |                                                                                      |
                                [N]                                                                           display lock file content 
                                 |                                                                                      |
                                 |                                                                         -[Y]-notify user-> remove lock file --+
                         --[Y]--> run application                                     |                                                                                 |
                                 |                                                                                     [N]->notify user how to remove the lock file if they                               |
                                [N]                                                                                         wish to restart the operation in progress                                     |
                                 |                                                                                                          |                                                             |
                                 |                                                                                                          |                                                             |
                                 |                                                                                                        exit CLI or return to GUI main window                           |
                                 |                                                                                                                                                                        |
                                 |                                                                                                                                                                        |
                                 +<-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
                                 |                                                                                                                                                                              
                application-specific lock file created with content consisting of status message (ie., "Application FooBar download begun at Dec 31 2020 23:59")
                                 |                               
                          log file entry created                       
                                 |                               
                          download started                       
                                 |                               
                          download completed
                                 |                               
                         --[N]--> log entry, notify user via pop-up or CLI error message, remove lock file
                                 |                               
                                [Y]                              
                                 |                               
                           lock file content updated (ie., "Download successful at ...")
                                 |                               
                           log file entry
                                 |                               
                           lock file content updated (ie., "Archive extraction begun at ...")
                                 |                               
                           begin zipfile extraction to a temp directory "adjacent" to the final
                           location (ie., ~/.CaPTk/1.8/FooBar.$RANDOM)
                                 |                               
                         --[N]--> log entry, notify user via pop-up or CLI error message, remove lock file
                                 |                               
                                [Y]                              
                                 |                               
                           lock file content updated (ie., "Zipfile extraction successful at ...")
                                 |                               
                           log file entry
                                 |                               
                           move (rename) the tempdir to the final name
                                 |                               
                                 |                               
                         --[N]--> log entry, notify user via pop-up or CLI error message, remove lock file, remove temp dir
                                 |                               
                                [Y]                              
                                 |                               
                           remove lock file
                                 |                               
                           update log
                                 |                               
                           launch application

sarthakpati commented 4 years ago

@dboun's comment

@sarthakpati 's algorithm can raise issues if two processes download the same thing

Has been addressed. I actually thought that it would part of the downloading function. Anyway, I have made it explicit.

MarkBergman-cbica commented 4 years ago

Good idea to continue the discussion here.

My idea was something along the following direction:


if exist( ${USER_HOME}/.CaPTk/${version}/apps/libra/ready.txt )
    # everything is as expected and the user can run LIBRA

This means that both LIBRA (MCR, executables, etc) and a text file are now required to execute the app. The requirement can be simplified to just a test of the existence of the LIBRA dir (ie., ${USER_HOME}/.CaPTk/${version}/apps/libra) if we can ensure that the presence/absence of that directory alone is a valid test (see my comment about renaming files -- the POSIX rename() system call is an atomic operation).

if not exist( ${USER_HOME}/.CaPTk/${version}/apps/libra/downloadFinished.txt )    
    # re-download
else
    # re-download
    # verify checksum, re-download if not same

In order to verify the checksum, we'd need a way to specify that data, in a separate communication from the download itself. (ie., step 1: download checksum, step 2: verify that the received checksum is itself valid (36 bytes, alpha-numeric, etc), step 3: download zip file...)

I'd say that's not worth the trouble, as the unzip would [likely] fail if the zip file is corrupt.

Possibly replace verify checksum with use zip utility to test zipfile before unzipping -- assuming that action is part of the unzip implementation.

Is this really worth the effort?

    write ${USER_HOME}/.CaPTk/${version}/apps/libra/downloadFinished.txt

I know this is pseudo-code, but any lock file creation should be handled carefully.

Please see 7.11.2 Locking: https://dwheeler.com/secure-programs/Secure-Programs-HOWTO/avoid-race.html It's 2020...which NFS v2 still exists, the chance of running into it with a CaPTk install, with this particular problem with file locking, is pretty slim, so something like this is probably OK (once my pseudo-code is corrected & it's upgraded to C++):

lockfile = "_$USER/.CaPTk/application_.lock"
if ( (fd = open(lockfile, O_CREAT|O_EXCL)) == 0 )
    /** We have the lock file...do something *//

    close(fd);
} else {
  fprintf(stderr, "Could not get lockfile: %s\n",lockfile);

}

    # at this point, download is done and extraction can proceed

    if( unzip downloaded_file exists with 0)
        delete ${USER_HOME}/.CaPTk/${version}/apps/libra/downloadFinished.txt
        write ${USER_HOME}/.CaPTk/${version}/apps/libra/ready.txt

Swap those two lines to reduce the window for the race condition.

MarkBergman-cbica commented 4 years ago

Should really be:


                        user runs application from GUI or CLI
                                 |                               
                         --[Y]--> run application
                                 |                               
                                [N]
                                 |                               
                         --[Y]--> create log entry (ie., application FooBar launched while lock file exists)

ashishsingh18 commented 4 years ago

@MarkBergman-cbica I haven't been able to fully understand the txt file that you shared.

I like @sarthakpati's idea. Although I don't like reading/writing multiple files. All this can be easily done with QSettings which provides a standard way of handling application settings/preferences/configuration. etc. I have already implemented a singleton and QSettings reader/writer for preferences that can be very easily extended to handle this app download use case. @PhucNgo1711 I can help you with this.

Basically we need to store settings for: a) Download Started, b) Download finished, c) Extraction Started, d) Extraction Finished. And based on these settings we handle the behavior of the application.

Let's say a user clicks on application LIBRA, this starts the download, the 'download started' setting is stored. If the application crashes in between and the user restarts the whole process, the setting is read and corresponding action is taken just like the if-else blocks mentioned in @sarthakpati's comment.

Also we handle many things on the UI side itself. For example, if the user has clicked once on LIBRA( this has started the download+extraction process which is still in progress), we don't handle subsequent clicks on the LIBRA until the extraction is done or just throw a message saying the download is already in progress or something along those lines.

@PhucNgo1711 please try all these on the standalone before you integrate any file into captk. It will be much easier to handle.

ashishsingh18 commented 4 years ago

@sarthakpati 's algorithm can raise issues if two processes download the same thing (the first process' downloading will be overwritten and subsequently crash or worse during unzipping), as it doesn't feature locks.

@dboun can you expand on this scenario? I haven't fully understood this. What is the use case?

sarthakpati commented 4 years ago

I like the idea of using QSettings as mentioned by @ashishsingh18.

All this can be easily done with QSettings which provides a standard way of handling application settings

@PhucNgo1711: Can we put this entire mechanism into a separate class so that we can use it after migrating to MITK?

dboun commented 4 years ago

@sarthakpati 's algorithm can raise issues if two processes download the same thing (the first process' downloading will be overwritten and subsequently crash or worse during unzipping), as it doesn't feature locks.

@dboun can you expand on this scenario? I haven't fully understood this. What is the use case?

My comment was about handling multiple processes trying to download the same extension, as it was mentioned in the meeting. For instance, someone trying to download libra both through the CLI and the GUI.

In practice though, I think this is pretty shooting-yourself-in-the-foot behavior and I'm not sure if we need to be concerned about it too much.

Also, locks might introduce scenarios that lead to (accidental) hard locks; I think that is a much, much bigger worry.

MarkBergman-cbica commented 4 years ago

In the message dated: Thu, 30 Apr 2020 09:00:25 -0700, The pithy ruminations from Dimitris Bounias on [[External] Re: [CBICA/CaPTk] Decouple the pre-compiled binaries from CaPTk installation and packaging (#883)] were: => > > @sarthakpati 's algorithm can raise issues if two processes download the same thing (the first process' downloading will be overwritten and subsequently crash or worse during unzipping), as it doesn't feature locks. => > => > @dboun can you expand on this scenario? I haven't fully understood this. What is the use case? => => My comment was about handling multiple processes trying to download the same extension, as it was mentioned in the meeting. For instance, someone trying to download libra both through the CLI and the GUI.

Or, even more likely just through the CLI....picture someong running the following immediately after installing CaPTk:

for image in *DCM
do
    libra $image
done

and there are "N" downloads going on, and "N" attempts to unzip & install LIBRA.

=> => In practice though, I think this is pretty shooting-yourself-in-the-foot behavior and I'm not sure if we need to be concerned about it too much. => => Also, locks might introduce scenarios that lead to (accidental) hard locks; I think that is a much, much bigger worry.

Yes.

PhucNgo1711 commented 4 years ago

@ashishsingh18 @sarthakpati sure I'll take a look at QSetting and put this stuff in a separate class.

ashishsingh18 commented 4 years ago

Done.