YeoLab / flotilla

Reproducible machine learning analysis of gene expression and alternative splicing data
http://yeolab.github.io/flotilla/docs
BSD 3-Clause "New" or "Revised" License
121 stars 26 forks source link

flotilla.embark is broken, possible version incompatibility? #111

Closed boyko- closed 10 years ago

boyko- commented 10 years ago

all of the sudden, flotilla.Study and flotilla.embark stopped working. I fixed flotilla.Study by specifying that '0.1.0' is it's version parameter and moving it from second to last place in the call per the current constructor definition. This leads me to think that the API has been updated (by Olga?). This would explain the following error with flotilla.embark that I don't know how to fix since embark doesn't take colors as parameters:

TypeError                                 Traceback (most recent call last)
<ipython-input-27-baddcaeec2a2> in <module>()
----> 1 study = flotilla.embark('immune_single_cell_sailfish')

/projects/ps-yeolab/software/anaconda-2.0.1_2014-08-12/lib/python2.7/site-packages/flotilla/__init__.pyc in embark(study_name, load_species_data)
     38             '~/flotilla_projects/{}/datapackage.json'.format(study_name)))
     39         return Study.from_datapackage_file(filename,
---> 40                                            load_species_data=load_species_data)
     41     except IOError:
     42         return Study.from_datapackage_url(study_name,

/projects/ps-yeolab/software/anaconda-2.0.1_2014-08-12/lib/python2.7/site-packages/flotilla/data_model/study.pyc in from_datapackage_file(cls, datapackage_filename, load_species_data, species_datapackage_base_url)
    433             datapackage, datapackage_dir=datapackage_dir,
    434             load_species_data=load_species_data,
--> 435             species_datapackage_base_url=species_datapackage_base_url)
    436 
    437     @classmethod

/projects/ps-yeolab/software/anaconda-2.0.1_2014-08-12/lib/python2.7/site-packages/flotilla/data_model/study.pyc in from_datapackage(cls, datapackage, datapackage_dir, load_species_data, species_datapackage_base_url)
    594             sources=sources,
    595             version=version,
--> 596             **kwargs)
    597         return study
    598 

/projects/ps-yeolab/software/anaconda-2.0.1_2014-08-12/lib/python2.7/site-packages/flotilla/data_model/study.pyc in __init__(self, sample_metadata, expression_data, splicing_data, expression_feature_data, expression_feature_rename_col, splicing_feature_data, splicing_feature_rename_col, mapping_stats_data, mapping_stats_number_mapped_col, mapping_stats_min_reads, spikein_data, spikein_feature_data, drop_outliers, species, gene_ontology_data, expression_log_base, predictor_config_manager, metadata_pooled_col, metadata_phenotype_col, phenotype_order, phenotype_to_color, phenotype_to_marker, license, title, sources, version)
    293             phenotype_to_marker, pooled_col=metadata_pooled_col,
    294             phenotype_col=metadata_phenotype_col,
--> 295             predictor_config_manager=self.predictor_config_manager)
    296         self.phenotype_col = self.metadata.phenotype_col
    297         self.phenotype_order = self.metadata.phenotype_order

/projects/ps-yeolab/software/anaconda-2.0.1_2014-08-12/lib/python2.7/site-packages/flotilla/data_model/metadata.pyc in __init__(self, data, phenotype_order, phenotype_to_color, phenotype_to_marker, phenotype_col, pooled_col, predictor_config_manager)
     56                     color = mpl.colors.rgb2hex(colors.next())
     57                 try:
---> 58                     color = str_to_color[color]
     59                 except KeyError:
     60                     pass

TypeError: unhashable type: 'list'

> /projects/ps-yeolab/software/anaconda-2.0.1_2014-08-12/lib/python2.7/site-packages/flotilla/data_model/metadata.py(60)__init__()
     59                 except KeyError:
---> 60                     pass
     61                 self.phenotype_to_color[phenotype] = color

ipdb> color
[0.30196078431372547, 0.6862745098039216, 0.2901960784313726]
olgabot commented 10 years ago

I know what's wrong and I'll fix it by the end of the day


Olga Botvinnik PhD Program in Bioinformatics and Systems Biology Gene Yeo Laboratory http://yeolab.ucsd.edu/yeolab/Home.html | Sanford Consortium for Regenerative Medicine University of California, San Diego www http://olgabotvinnik.com | blog http://blog.olgabotvinnik.com/ | github http://github.com/olgabot | twitter http://twitter.com/olgabot | linkedin http://www.linkedin.com/in/olgabotvinnik

2014-10-02 11:05 GMT-07:00 Boyko Kakaradov notifications@github.com:

Assigned #111 https://github.com/YeoLab/flotilla/issues/111 to @olgabot https://github.com/olgabot.

— Reply to this email directly or view it on GitHub https://github.com/YeoLab/flotilla/issues/111#event-173403406.

olgabot commented 10 years ago

You'll need to re-create your datapackage. It's an issue with the way matplotlib sets colors, because "color" is list of a [r, g, b] tuple, and it needs to be converted to the hex color.


Olga Botvinnik PhD Program in Bioinformatics and Systems Biology Gene Yeo Laboratory http://yeolab.ucsd.edu/yeolab/Home.html | Sanford Consortium for Regenerative Medicine University of California, San Diego www http://olgabotvinnik.com | blog http://blog.olgabotvinnik.com/ | github http://github.com/olgabot | twitter http://twitter.com/olgabot | linkedin http://www.linkedin.com/in/olgabotvinnik

2014-10-02 11:32 GMT-07:00 Olga Botvinnik obotvinn@ucsd.edu:

I know what's wrong and I'll fix it by the end of the day


Olga Botvinnik PhD Program in Bioinformatics and Systems Biology Gene Yeo Laboratory http://yeolab.ucsd.edu/yeolab/Home.html | Sanford Consortium for Regenerative Medicine University of California, San Diego www http://olgabotvinnik.com | blog http://blog.olgabotvinnik.com/ | github http://github.com/olgabot | twitter http://twitter.com/olgabot | linkedin http://www.linkedin.com/in/olgabotvinnik

2014-10-02 11:05 GMT-07:00 Boyko Kakaradov notifications@github.com:

Assigned #111 https://github.com/YeoLab/flotilla/issues/111 to @olgabot https://github.com/olgabot.

— Reply to this email directly or view it on GitHub https://github.com/YeoLab/flotilla/issues/111#event-173403406.

boyko- commented 10 years ago

OK, let me know when this is fixed! Also, how do I re-create the data package when I'm using the data frame version of flotilla.Study(metadata_df, data_df, ...) ?

olgabot commented 10 years ago

can you try it now, and paste the contents of "datapackage.json" into the comment? Please use the code-formatting using 3 backticks "```" on the lines before and after the code block


Olga Botvinnik PhD Program in Bioinformatics and Systems Biology Gene Yeo Laboratory http://yeolab.ucsd.edu/yeolab/Home.html | Sanford Consortium for Regenerative Medicine University of California, San Diego www http://olgabotvinnik.com | blog http://blog.olgabotvinnik.com/ | github http://github.com/olgabot | twitter http://twitter.com/olgabot | linkedin http://www.linkedin.com/in/olgabotvinnik

2014-10-02 16:50 GMT-07:00 Boyko Kakaradov notifications@github.com:

OK, let me know when I can try re-creating the datapackage!

— Reply to this email directly or view it on GitHub https://github.com/YeoLab/flotilla/issues/111#issuecomment-57729417.

olgabot commented 10 years ago

get the dev version with:

git checkout -t origin/dev

and reinstall


Olga Botvinnik PhD Program in Bioinformatics and Systems Biology Gene Yeo Laboratory http://yeolab.ucsd.edu/yeolab/Home.html | Sanford Consortium for Regenerative Medicine University of California, San Diego www http://olgabotvinnik.com | blog http://blog.olgabotvinnik.com/ | github http://github.com/olgabot | twitter http://twitter.com/olgabot | linkedin http://www.linkedin.com/in/olgabotvinnik

2014-10-02 17:02 GMT-07:00 Olga Botvinnik obotvinn@ucsd.edu:

can you try it now, and paste the contents of "datapackage.json" into the comment? Please use the code-formatting using 3 backticks "```" on the lines before and after the code block


Olga Botvinnik PhD Program in Bioinformatics and Systems Biology Gene Yeo Laboratory http://yeolab.ucsd.edu/yeolab/Home.html | Sanford Consortium for Regenerative Medicine University of California, San Diego www http://olgabotvinnik.com | blog http://blog.olgabotvinnik.com/ | github http://github.com/olgabot | twitter http://twitter.com/olgabot | linkedin http://www.linkedin.com/in/olgabotvinnik

2014-10-02 16:50 GMT-07:00 Boyko Kakaradov notifications@github.com:

OK, let me know when I can try re-creating the datapackage!

— Reply to this email directly or view it on GitHub https://github.com/YeoLab/flotilla/issues/111#issuecomment-57729417.

boyko- commented 10 years ago

sure, the current "broken" contents of datapackage.json:

{
  "name": "immune_single_cell_sailfish", 
  "title": null, 
  "datapackage_version": "0.1.1", 
  "sources": null, 
  "licenses": null, 
  "species": "mm10", 
  "resources": [
    {
      "name": "expression", 
      "log_base": null, 
      "format": "csv", 
      "path": "/home/bkakarad/flotilla_projects/immune_single_cell_sailfish/expression.csv.gz", 
      "feature_rename_col": "gene_name", 
      "compression": "gzip"
    }, 
    {
      "pooled_col": null, 
      "name": "metadata", 
      "phenotype_to_marker": {
        "Tem": "o", 
        "day7": "o", 
        "Naive": "o", 
        "Tcm": "o", 
        "1div": "o", 
        "day4": "o"
      }, 
      "format": "csv", 
      "phenotype_to_color": {
        "Tem": [
          0.596078431372549, 
          0.3058823529411765, 
          0.6392156862745098
        ], 
        "day7": [
          1.0, 
          1.0, 
          0.2
        ], 
        "Naive": [
          0.21568627450980393,         
          0.49411764705882355,         
          0.7215686274509804
        ], 
        "Tcm": [
          0.30196078431372547, 
          0.6862745098039216, 
          0.2901960784313726
        ], 
        "1div": [
          0.8941176470588236, 
          0.10196078431372549, 
          0.10980392156862745
        ], 
        "day4": [
          1.0, 
          0.4980392156862745, 
          0.0
        ]
      }, 
      "path": "/home/bkakarad/flotilla_projects/immune_single_cell_sailfish/metadata.csv.gz", 
      "phenotype_col": "celltype", 
      "phenotype_order": [
        "1div", 
        "Naive", 
        "Tcm", 
        "Tem", 
        "day4", 
        "day7"
      ], 
      "compression": "gzip"
    }
  ]
}
boyko- commented 10 years ago

switched to dev version successfully, but it didn't reinstall correctly:

pip install -e .
Obtaining file:///home/bkakarad/git/flotilla
  Running setup.py (path:/home/bkakarad/git/flotilla/setup.py) egg_info for package from file:///home/bkakarad/git/flotilla

Requirement already satisfied (use --upgrade to upgrade): setuptools in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages/setuptools-5.8-py2.7.egg (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.8.0 in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): scipy>=0.14 in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): matplotlib>=1.3.1 in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): scikit-learn>=0.13.0 in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): gspread in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): brewer2mpl in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): pymongo>=2.7 in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): ipython>=2.0.0 in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): husl in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): patsy>=0.2.1 in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): pandas>=0.13.1 in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): statsmodels>=0.5.0 in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): seaborn>=0.3 in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): networkx in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): tornado>=3.2.1 in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): pyzmq in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): six in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): pytest-cov in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): python-coveralls in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): jinja2 in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): semantic-version in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): py>=1.4.22 in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from pytest-cov->flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): pytest>=2.6.0 in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from pytest-cov->flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): cov-core>=1.14.0 in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from pytest-cov->flotilla==0.1.0)
Requirement already satisfied (use --upgrade to upgrade): coverage>=3.6 in /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages (from cov-core>=1.14.0->pytest-cov->flotilla==0.1.0)
Installing collected packages: flotilla
  Running setup.py develop for flotilla

    Creating /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages/flotilla.egg-link (link to .)
    flotilla 0.1.0 is already the active version in easy-install.pth

    Installed /home/bkakarad/git/flotilla
Successfully installed flotilla
Cleaning up...
olgabot commented 10 years ago

That's very strange, the dev version is 0.1.1... I'll try updating it now

olgabot commented 10 years ago

okay I updated it to 0.1.2.

Since you're on the dev branch now, do git pull origin dev and before you upgrade, remove the flotilla stuff in your site-packages:

rm -rf /projects/ps-yeolab/software/anaconda2/envs/boyko/lib/python2.7/site-packages/flotilla*

then try to install again

olgabot commented 10 years ago

To re-create the dataframe, do:

study = flotilla.Study(metadata=metadata, version='0.1.0', expression=expression, ....)

the docs should be helpful too :)

boyko- commented 10 years ago

successfully pulled and installed flotilla 0.1.2 successfully re-created the dataframe. Here is the new datapackage.json which has the phenotype_to_color, but the following screenshot confirms that the colors in the PCA and violin plots are still out of sync:

{
  "name": "immune_single_cell_sailfish", 
  "title": null, 
  "datapackage_version": "0.1.2", 
  "sources": null, 
  "licenses": null, 
  "species": "mm10", 
  "resources": [
    {
      "name": "expression", 
      "log_base": null, 
      "format": "csv", 
      "path": "/home/bkakarad/flotilla_projects/immune_single_cell_sailfish/expression.csv.gz", 
      "feature_rename_col": "gene_name", 
      "compression": "gzip"
    }, 
    {
      "pooled_col": null, 
      "name": "metadata", 
      "phenotype_to_marker": {
        "Tem": "o", 
        "day7": "o", 
        "Naive": "o", 
        "Tcm": "o", 
        "1div": "o", 
        "day4": "o"
      }, 
      "format": "csv", 
      "phenotype_to_color": {
        "Tem": "#377eb8", 
        "day7": "#fdfc33", 
        "Naive": "#4eae4b", 
        "Tcm": "#e41a1c", 
        "1div": "#994fa1", 
        "day4": "#ff8101"
      }, 
      "path": "/home/bkakarad/flotilla_projects/immune_single_cell_sailfish/metadata.csv.gz", 
      "phenotype_col": "celltype", 
      "phenotype_order": [
        "1div", 
        "Naive", 
        "Tcm", 
        "Tem", 
        "day4", 
        "day7"
      ], 
      "compression": "gzip"
    }
  ]
}

image image

olgabot commented 10 years ago

Great! The colors/phenotype order will be done another day.