aalto-ics-kepaco / retention_order_prediction

Code, Data and Results of the publication: "Liquid-Chromatography Retention Order Prediction for Metabolite Identification" by Bach et al. 2018
Other
3 stars 1 forks source link

Docker image #2

Closed Danissss closed 4 years ago

Danissss commented 4 years ago

Hi,

Do you think you have time to create a docker image with everything installed? It's very difficult to create the original working environment with conda as most the packages are updated.

For example, AttributeError: module 'scipy' has no attribute 'sparse' I believe is caused by scikit-learn

scikit-learn              0.19.0          py36_nomklh752ef42_2  [nomkl]
scipy                     0.19.1          py36_nomklh4b14279_3  [nomkl]
bachi55 commented 4 years ago

Hei Danissss,

I cannot promise a docker container soon. However, I am very sure the code works with the latest scikit-learn version as well. Same probably holds for the other packages as well.

Scipy is currently in version 1.4. So there might has been some restructuring with respect to 0.19. But I think the changes to the code, e.g. accessing the 'sparse' module, should be minor.

I suggest you try with the latest package versions and than post directly where you receive errors running the code. I have the RankSVM implementation compatible with the latest (relevant) package versions. So we can make things running I guess.

Sorry, this answer is probably not what you expected, but lets first see where problems with the latest package / python version appear and then solve them.

Best regards,

Eric

Danissss commented 4 years ago

I think issue is about submodule import with python. see https://stackoverflow.com/questions/8696322/why-does-this-attributeerror-in-python-occur I added import scipy.sparse in rank_svm_cls.py. Then I create the conda env as conda create --name retentionOrderPred python=3.6 scipy=0.19.1 numpy=1.13.1 pandas=0.20.3 scikit-learn=0.19.0 networkx=2.0 joblib

Now it can run successfully with the sample input. OS: MacOS Catalina Conda: 4.8.1

Danissss commented 4 years ago

How long does the program need for finishing running the python src/evaluation_scenarios_main.py ranksvm baseline_single 10 -1 results/raw/PredRet/v2/config.json 2 False? It seems keep repeating the same operation over and over?

Target systems (# = 1): Eawag_XBridgeC18
Training systems (# = 1): Eawag_XBridgeC18
Target set 'Eawag_XBridgeC18' contains 317 examples.
Process target system: Eawag_XBridgeC18 (1/1).
Repetition: 1/10
Training set contains 317 (100.000000%) examples.
    System Eawag_XBridgeC18 contributes 317 (100.000000%) examples.
Outer validation split strategy: KFold
Outer fold: 1/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 30.054sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.348
Outer fold: 2/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.626sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.342
Outer fold: 3/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.555sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.346
Outer fold: 4/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 26.827sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.338
Outer fold: 5/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 26.826sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.341
Outer fold: 6/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.039sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.344
Outer fold: 7/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 26.915sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.349
Outer fold: 8/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.140sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.343
Outer fold: 9/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.404sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.348
Outer fold: 10/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 26.816sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.327
Repetition: 2/10
Training set contains 317 (100.000000%) examples.
    System Eawag_XBridgeC18 contributes 317 (100.000000%) examples.
Outer validation split strategy: KFold
Outer fold: 1/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 26.669sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.334
Outer fold: 2/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.353sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.359
Outer fold: 3/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 26.761sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.347
Outer fold: 4/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.234sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.357
Outer fold: 5/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.160sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.362
Outer fold: 6/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.218sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.361
Outer fold: 7/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.713sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.370
Outer fold: 8/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 28.112sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.379
Outer fold: 9/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.712sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.364
Outer fold: 10/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 28.527sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.360
Repetition: 3/10
Training set contains 317 (100.000000%) examples.
    System Eawag_XBridgeC18 contributes 317 (100.000000%) examples.
Outer validation split strategy: KFold
Outer fold: 1/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 28.047sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.368
Outer fold: 2/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 26.957sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.353
Outer fold: 3/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.171sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.357
Outer fold: 4/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.802sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.343
Outer fold: 5/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 26.980sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.336
Outer fold: 6/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.140sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.372
Outer fold: 7/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.267sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.356
Outer fold: 8/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.094sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.365
Outer fold: 9/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 26.773sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.327
Outer fold: 10/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 26.893sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.350
Repetition: 4/10
Training set contains 317 (100.000000%) examples.
    System Eawag_XBridgeC18 contributes 317 (100.000000%) examples.
Outer validation split strategy: KFold
Outer fold: 1/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.103sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.339
Outer fold: 2/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.219sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.343
Outer fold: 3/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.879sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.348
Outer fold: 4/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.322sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.358
Outer fold: 5/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.038sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.351
Outer fold: 6/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 26.661sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.355
Outer fold: 7/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 26.651sec
    C     target_system  training_systems
0  10  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.337
Outer fold: 8/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9 
[find_hparam_*] 27.160sec
   C     target_system  training_systems
0  1  Eawag_XBridgeC18  Eawag_XBridgeC18
[Get prediction score] Elapsed: 0.348
Outer fold: 9/10
Inner (h-param) validation split strategy: GroupKFold
Get pairs for hparam estimation: 0 1 2 3 4 5 6 7 8 9
bachi55 commented 4 years ago

The evaluation runs a 10-times repeated 10-fold cross-validation. You can reduce the number of repetitions in the main-file. -Eric