FertigLab / pycogaps

python API to the CoGAPS NMF package
https://fertiglab.github.io/CoGAPSGuide/procedureone/
GNU General Public License v2.0
16 stars 6 forks source link

Distributed inoperable in Docker Hub version #65

Open dimalvovs opened 10 months ago

dimalvovs commented 10 months ago

There is an issue with operation of the Dockerhub version of pycogaps. There is a newer version of docker image available in the ghcr, it would make sense to continue maintaining just one of them.

Steps to reproduce:

  1. pull the image docker pull fertiglab/pycogaps
  2. run the image docker run -it --entrypoint /bin/bash fertiglab/pycogaps
  3. validate that standard version works well:

    
    echo "if __name__ == '__main__':
    from PyCoGAPS.parameters import *
    from PyCoGAPS.pycogaps_main import CoGAPS
    import scanpy as sc
    
    modsimpath = 'data/ModSimData.txt'
    modsim = sc.read_text(modsimpath)
    
    params = CoParams(path=modsimpath)
    params.printParams()
    
    setParams(params, {
        'nIterations':10000,
        'seed': 42,
        'nPatterns': 3
    })
    
    params.printParams()
    start = time.time()
    result = CoGAPS(modsimpath, params)
    end = time.time()
    print('TIME:', end - start)
    
    result.write('data/dist_modsim.h5ad')" > test2.py

python3 test2.py


| \ / \ | \ / \ | __ \/ | | |/ / | / \/ | | \// /\ | |_/ /\ `--. | _/ | | | | / | | | _ || / `--. | | | | || | _/\ () | |\ | | | || | /_/ / _| _, |___/\/ _/_| |/_| _/ / |
|
/

-- Standard Parameters -- nPatterns: 3 nIterations: 1000 seed: 0 sparseOptimization: False

-- Sparsity Parameters -- alpha: 0.01 maxGibbsMass: 100.0

setting distributed parameters - call this again if you change nPatterns if you wish to perform genome-wide distributed cogaps, please run setParams(params, "distributed", "genome-wide")

-- Standard Parameters -- nPatterns: 3 nIterations: 10000 seed: 42 sparseOptimization: False

-- Sparsity Parameters -- alpha: 0.01 maxGibbsMass: 100.0

This is pycogaps version 0.0.1 Running Standard CoGAPS on ModSimData.txt ( 25 genes and 20 samples) with parameters:

-- Standard Parameters -- nPatterns: 3 nIterations: 10000 seed: 42 sparseOptimization: False

-- Sparsity Parameters -- alpha: 0.01 maxGibbsMass: 100.0

Data Model: Dense, Normal Sampler Type: Sequential Loading Data...Done! (00:00:00) -- Equilibration Phase -- 1000 of 10000, Atoms: 64(A), 45(P), ChiSq: 1830, Time: 00:00:00 / 00:00:00 2000 of 10000, Atoms: 69(A), 42(P), ChiSq: 1466, Time: 00:00:00 / 00:00:00 3000 of 10000, Atoms: 80(A), 50(P), ChiSq: 1229, Time: 00:00:00 / 00:00:00 4000 of 10000, Atoms: 73(A), 54(P), ChiSq: 1212, Time: 00:00:00 / 00:00:00 5000 of 10000, Atoms: 86(A), 52(P), ChiSq: 1151, Time: 00:00:00 / 00:00:00 6000 of 10000, Atoms: 81(A), 52(P), ChiSq: 1151, Time: 00:00:00 / 00:00:00 7000 of 10000, Atoms: 75(A), 48(P), ChiSq: 1178, Time: 00:00:00 / 00:00:00 8000 of 10000, Atoms: 70(A), 57(P), ChiSq: 1155, Time: 00:00:00 / 00:00:00 9000 of 10000, Atoms: 73(A), 54(P), ChiSq: 1173, Time: 00:00:00 / 00:00:00 10000 of 10000, Atoms: 79(A), 58(P), ChiSq: 1159, Time: 00:00:00 / 00:00:00 -- Sampling Phase -- 1000 of 10000, Atoms: 74(A), 51(P), ChiSq: 1125, Time: 00:00:00 / 00:00:00 2000 of 10000, Atoms: 78(A), 56(P), ChiSq: 1161, Time: 00:00:00 / 00:00:00 3000 of 10000, Atoms: 79(A), 57(P), ChiSq: 1166, Time: 00:00:00 / 00:00:00 4000 of 10000, Atoms: 69(A), 55(P), ChiSq: 1176, Time: 00:00:00 / 00:00:00 5000 of 10000, Atoms: 80(A), 55(P), ChiSq: 1175, Time: 00:00:00 / 00:00:00 6000 of 10000, Atoms: 81(A), 48(P), ChiSq: 1168, Time: 00:00:00 / 00:00:00 7000 of 10000, Atoms: 73(A), 56(P), ChiSq: 1151, Time: 00:00:00 / 00:00:00 8000 of 10000, Atoms: 72(A), 51(P), ChiSq: 1156, Time: 00:00:00 / 00:00:00 9000 of 10000, Atoms: 75(A), 60(P), ChiSq: 1155, Time: 00:00:00 / 00:00:00 10000 of 10000, Atoms: 80(A), 50(P), ChiSq: 1179, Time: 00:00:00 / 00:00:00

GapsResult result object with 25 features and 20 samples 3 patterns were learned

TIME: 0.9086663722991943

4. provide the distributed config file: 

echo "if name == 'main': from PyCoGAPS.parameters import * from PyCoGAPS.pycogaps_main import CoGAPS import scanpy as sc

modsimpath = 'data/ModSimData.txt'
modsim = sc.read_text(modsimpath)

params = CoParams(path=modsimpath)
params.printParams()

setParams(params, {
    'nIterations':10000,
    'seed': 42,
    'nPatterns': 3,
    'useSparseOptimization': True,
    'distributed': 'genome-wide'
})

params.setDistributedParams(nSets=2)
params.printParams()
start = time.time()
result = CoGAPS(modsimpath, params)
end = time.time()
print('TIME:', end - start)

result.write('data/dist_modsim.h5ad')" > test.py
5. run the program `python3 test.py`
6. Observed output contains an error:

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/home/user/pycogaps-docker/PyCoGAPS/pycogaps_main.py", line 313, in callInternalCoGAPS gapsresult = standardCoGAPS(adata, params, uncertainty, transposeData=params.coparams["transposeData"]) File "/home/user/pycogaps-docker/PyCoGAPS/pycogaps_main.py", line 166, in standardCoGAPS result = GapsResultToAnnData(gapsresultobj, adata, prm) File "/home/user/pycogaps-docker/PyCoGAPS/helper_functions.py", line 434, in GapsResultToAnnData Pmean = toNumpy(gapsresult.Pmean)[prm.coparams["subsetIndices"], :] IndexError: index 22 is out of bounds for axis 0 with size 20 """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "test.py", line 23, in result = CoGAPS(modsimpath, params) File "/home/user/pycogaps-docker/PyCoGAPS/pycogaps_main.py", line 44, in CoGAPS result = distributedCoGAPS(path, params, uncertainty=None) File "/home/user/pycogaps-docker/PyCoGAPS/pycogaps_main.py", line 197, in distributedCoGAPS result = list(result) File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 868, in next raise value IndexError: index 22 is out of bounds for axis 0 with size 20

tomsing1 commented 5 months ago

@dimalvovs Thank you for flagging this bug, and for pointing out the ghcr image. It seems like that image doesn't have a functional vignette_from_args.py script, though:

Traceback (most recent call last):
  File "/pycogaps/vignette_from_args.py", line 45, in <module>
    setParams(params, prm['run_params'])
  File "/pycogaps/PyCoGAPS/parameters.py", line 168, in setParams
    setParam(paramobj, k, v)
  File "/pycogaps/PyCoGAPS/parameters.py", line 247, in setParam
    setattr(paramobj.gaps, whichParam, value)
AttributeError: 'pycogaps.GapsParameters' object has no attribute 'uncertainty'

Do you have advice on how to build / obtain a docker image that can be used to process jobs based on a custom YAML file with parameters? Thanks for any pointers!