cmap / cmapPy

Assorted tools for interacting with .gct, .gctx files and other Connectivity Map (Broad Institute) data/tools
https://clue.io/cmapPy/index.html
BSD 3-Clause "New" or "Revised" License
126 stars 76 forks source link

cmapPy_pandasGEXpress_tutorial.ipynb fails to parse ids #50

Closed maxdrohde closed 5 years ago

maxdrohde commented 5 years ago

When running the tutorial notebook ('cmapPy_pandasGEXpress_tutorial.ipynb'), this command creates an error:

from cmapPy.pandasGEXpress.parse import parse
vorinostat_only_gctoo = parse("GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328_2017-03-06.gctx", cid=vorinostat_ids)

some of the ids being used to subset the data are not present in the metadata for the file being parsed - mismatch_ids:  {'LPROT001_PC3_6H:P12', 'LPROT001_NPC.TAK_6H:O08', 'LJP008_A375_24H:G07', 'LJP008_SKL_24H:G12', 'LJP007_HCC515_24H:A03', 'LJP008_NPC.TAK_24H:G08', 'LJP008_HEPG2_24H:G08', 'LJP008_A375_24H:A03', 'LJP008_NEU_24H:G09', 'LJP008_PC3_24H:G09', 'LJP009_CD34_24H:A03', 'LJP008_ASC.C_24H:G07', 'LJP008_NEU_24H:G12', 'LJP008_ASC.C_24H:G10', 'LJP008_NPC.TAK_24H:G11', 'LPROT002_NPC.TAK_6H:O12', 'LPROT002_NPC.TAK_6H:O08', 'LJP008_HT29_24H:G08', 'LJP008_ASC_24H:G12', 'LJP008_HA1E_24H:G10', 'LJP007_HA1E_24H:A03', 'LJP008_HEPG2_24H:A03', 'LPROT001_A375_6H:P11', 'LPROT001_NPC.TAK_6H:O10', 'LJP009_NEU_24H:A03', 'LJP007_MNEU.E_24H:A03', 'LJP007_SKL.C_24H:A03', 'LJP008_SKL_24H:G10', 'LJP008_NPC.CAS9_24H:G12', 'LJP008_HEPG2_24H:G11', 'LJP008_CD34_24H:G09', 'LJP008_HCC515_24H:A03', 'LJP008_PC3_24H:G11', 'LPROT001_MCF7_6H:O11', 'LJP008_ASC_24H:G09', 'LJP008_NPC.CAS9_24H:G10', 'LJP008_MCF7_24H:G10', 'LJP008_CD34_24H:A03', 'LJP008_HA1E_24H:A03', 'LJP008_SKL_24H:G08', 'LJP008_HME1_24H:G10', 'LJP007_A375_24H:A03', 'LJP008_A375_24H:G08', 'LJP008_ASC.C_24H:A03', 'LJP008_NEU_24H:G11', 'LJP008_HME1_24H:G07', 'LJP008_HEPG2_24H:G10', 'LJP008_HT29_24H:G10', 'LJP008_ASC.C_24H:G11', 'LJP009_HME1_24H:A03', 'LJP008_HA1E_24H:G09', 'LPROT002_NPC.TAK_6H:O10', 'LPROT003_A549_6H:O10', 'LJP007_NPC_24H:A03', 'LPROT003_PC3_6H:O07', 'LJP008_SKL.C_24H:G10', 'LPROT002_MCF7_6H:P08', 'LJP007_CD34_24H:A03', 'LPROT003_NPC_6H:P11', 'LPROT002_MCF7_6H:P10', 'LJP008_PC3_24H:A03', 'LJP008_HUVEC_24H:G07', 'LPROT003_NPC_6H:P09', 'LJP008_HME1_24H:A03', 'LJP008_NPC.TAK_24H:G10', 'LJP008_MCF7_24H:G07', 'LJP008_HME1_24H:G09', 'LJP009_ASC_24H:A03', 'LJP008_ASC_24H:G08', 'LJP008_HT29_24H:G09', 'LJP007_HT29_24H:A03', 'LJP009_HUVEC_24H:A03', 'LPROT002_A549_6H:O09', 'LJP008_SKL_24H:G07', 'LJP008_NPC_24H:G09', 'LJP008_HCC515_24H:G12', 'LJP008_A549_24H:G09', 'LJP008_A549_24H:G08', 'LJP008_HEPG2_24H:G12', 'LPROT002_MCF7_6H:P12', 'LJP008_HA1E_24H:G07', 'LJP008_HUVEC_24H:G12', 'LJP008_NPC.CAS9_24H:G11', 'LPROT003_A375_6H:P12', 'LJP008_NPC.TAK_24H:A03', 'LPROT003_A549_6H:O12', 'LJP007_HUES3_24H:A03', 'LPROT003_NPC_6H:P07', 'LJP008_ASC.C_24H:G08', 'LPROT001_A375_6H:P07', 'LJP008_HCC515_24H:G09', 'LJP009_HT29_24H:A03', 'LJP008_HT29_24H:G11', 'LJP009_HEPG2_24H:A03', 'LJP008_SKL.C_24H:G11', 'LJP008_A549_24H:G10', 'LJP008_ASC_24H:A03', 'LJP008_A549_24H:A03', 'LJP008_A375_24H:G10', 'LPROT001_NPC.TAK_6H:O12', 'LJP008_MCF7_24H:G08', 'LPROT002_A549_6H:O07', 'LPROT003_A549_6H:O08', 'LJP008_CD34_24H:G07', 'LPROT003_PC3_6H:O09', 'LJP007_SKL_24H:A03', 'LPROT001_PC3_6H:P08', 'LJP008_A375_24H:G09', 'LJP008_HT29_24H:A03', 'LJP008_ASC_24H:G07', 'LJP007_HUVEC_24H:A03', 'LJP008_HUVEC_24H:G10', 'LJP008_HCC515_24H:G10', 'LJP008_ASC_24H:G10', 'LPROT003_PC3_6H:O11', 'LJP008_HT29_24H:G07', 'LJP008_SKL.C_24H:G12', 'LJP008_NPC.CAS9_24H:G09', 'LJP008_MCF7_24H:A03', 'LJP007_HME1_24H:A03', 'LJP007_NPC.CAS9_24H:A03', 'LJP008_HA1E_24H:G12', 'LPROT002_A549_6H:O11', 'LJP007_ASC_24H:A03', 'LJP008_NPC.TAK_24H:G12', 'LJP009_ASC.C_24H:A03', 'LJP008_HEPG2_24H:G09', 'LJP008_NEU_24H:G07', 'LJP008_NPC_24H:G08', 'LPROT001_MCF7_6H:O09', 'LPROT003_A375_6H:P10', 'LPROT003_A375_6H:P08', 'LJP008_CD34_24H:G11', 'LJP009_PC3_24H:A03', 'LJP008_CD34_24H:G12', 'LJP008_A375_24H:G12', 'LJP009_HA1E_24H:A03', 'LJP007_A549_24H:A03', 'LPROT002_A375_6H:P11', 'LJP008_A375_24H:G11', 'LJP007_NPC.TAK_24H:A03', 'LJP008_HT29_24H:G12', 'LJP008_NPC_24H:A03', 'LJP009_NPC_24H:A03', 'LJP008_SKL.C_24H:G07', 'LJP008_HME1_24H:G12', 'LJP009_SKL.C_24H:A03', 'LJP008_NPC_24H:G11', 'LJP008_CD34_24H:G08', 'LJP009_NPC.CAS9_24H:A03', 'LJP008_PC3_24H:G12', 'LJP008_MCF7_24H:G11', 'LJP008_PC3_24H:G10', 'LJP008_ASC.C_24H:G12', 'LPROT001_PC3_6H:P10', 'LJP007_MCF7_24H:A03', 'LJP008_HCC515_24H:G11', 'LJP008_HUVEC_24H:A03', 'LJP009_HCC515_24H:A03', 'LJP007_HEPG2_24H:A03', 'LJP009_A549_24H:A03', 'LJP008_A549_24H:G07', 'LJP008_HA1E_24H:G11', 'LJP008_PC3_24H:G08', 'LJP008_ASC.C_24H:G09', 'LJP008_SKL.C_24H:G08', 'LJP008_SKL_24H:A03', 'LJP009_A375_24H:A03', 'LJP008_CD34_24H:G10', 'LJP007_JURKAT_24H:A03', 'LJP008_MCF7_24H:G12', 'LJP008_HEPG2_24H:G07', 'LJP008_NPC.TAK_24H:G07', 'LJP007_ASC.C_24H:A03', 'LJP008_SKL_24H:G09', 'LPROT002_A375_6H:P09', 'LPROT001_MCF7_6H:O07', 'LJP008_A549_24H:G11', 'LJP009_SKL_24H:A03', 'LJP008_HME1_24H:G08', 'LJP008_HUVEC_24H:G09', 'LJP008_HME1_24H:G11', 'LJP008_SKL_24H:G11', 'LJP009_MCF7_24H:A03', 'LJP009_NPC.TAK_24H:A03', 'LJP008_SKL.C_24H:G09', 'LJP008_PC3_24H:G07', 'LJP008_HCC515_24H:G08', 'LJP008_NPC.CAS9_24H:G07', 'LJP008_NPC.TAK_24H:G09', 'LPROT001_A375_6H:P09', 'LJP007_NEU_24H:A03', 'LJP008_MCF7_24H:G09', 'LJP008_NPC_24H:G12', 'LJP008_NEU_24H:A03', 'LJP008_NPC.CAS9_24H:A03', 'LJP008_HUVEC_24H:G11', 'LJP008_NPC.CAS9_24H:G08', 'LJP008_HCC515_24H:G07', 'LJP008_NEU_24H:G10', 'LJP008_NEU_24H:G08', 'LJP008_A549_24H:G12', 'LJP008_NPC_24H:G10', 'LJP008_HUVEC_24H:G08', 'LJP008_NPC_24H:G07', 'LPROT002_A375_6H:P07', 'LJP007_PC3_24H:A03', 'LJP008_SKL.C_24H:A03', 'LJP008_HA1E_24H:G08', 'LJP008_ASC_24H:G11'}
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-5-38df3cb1c58c> in <module>()
      1 from cmapPy.pandasGEXpress.parse import parse
----> 2 vorinostat_only_gctoo = parse("GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328.gctx", cid=vorinostat_ids)

~/miniconda3/lib/python3.6/site-packages/cmapPy/pandasGEXpress/parse.py in parse(file_path, convert_neg_666, rid, cid, ridx, cidx, row_meta_only, col_meta_only, make_multiindex)
     66                               rid=rid, cid=cid, ridx=ridx, cidx=cidx,
     67                               row_meta_only=row_meta_only, col_meta_only=col_meta_only,
---> 68                               make_multiindex=make_multiindex)
     69 
     70     else:

~/miniconda3/lib/python3.6/site-packages/cmapPy/pandasGEXpress/parse_gctx.py in parse(gctx_file_path, convert_neg_666, rid, cid, ridx, cidx, row_meta_only, col_meta_only, make_multiindex)
    105 
    106         # validate optional input ids & get indexes to subset by
--> 107         (sorted_ridx, sorted_cidx) = check_and_order_id_inputs(rid, ridx, cid, cidx, row_meta, col_meta)
    108 
    109         data_dset = gctx_file[data_node]

~/miniconda3/lib/python3.6/site-packages/cmapPy/pandasGEXpress/parse_gctx.py in check_and_order_id_inputs(rid, ridx, cid, cidx, row_meta_df, col_meta_df)
    144     ordered_ridx = get_ordered_idx(row_type, row_ids, row_meta_df)
    145 
--> 146     col_ids = check_and_convert_ids(col_type, col_ids, col_meta_df)
    147     ordered_cidx = get_ordered_idx(col_type, col_ids, col_meta_df)
    148     return (ordered_ridx, ordered_cidx)

~/miniconda3/lib/python3.6/site-packages/cmapPy/pandasGEXpress/parse_gctx.py in check_and_convert_ids(id_type, id_list, meta_df)
    177         if id_type == "id":
    178             id_list = convert_ids_to_meta_type(id_list, meta_df)
--> 179             check_id_validity(id_list, meta_df)
    180         else:
    181             check_idx_validity(id_list, meta_df)

~/miniconda3/lib/python3.6/site-packages/cmapPy/pandasGEXpress/parse_gctx.py in check_id_validity(id_list, meta_df)
    193             mismatch_ids)
    194         logger.error(msg)
--> 195         raise Exception("parse_gctx check_id_validity " + msg)
    196 
    197 

Exception: parse_gctx check_id_validity some of the ids being used to subset the data are not present in the metadata for the file being parsed - mismatch_ids:  {'LPROT001_PC3_6H:P12', 'LPROT001_NPC.TAK_6H:O08', 'LJP008_A375_24H:G07', 'LJP008_SKL_24H:G12', 'LJP007_HCC515_24H:A03', 'LJP008_NPC.TAK_24H:G08', 'LJP008_HEPG2_24H:G08', 'LJP008_A375_24H:A03', 'LJP008_NEU_24H:G09', 'LJP008_PC3_24H:G09', 'LJP009_CD34_24H:A03', 'LJP008_ASC.C_24H:G07', 'LJP008_NEU_24H:G12', 'LJP008_ASC.C_24H:G10', 'LJP008_NPC.TAK_24H:G11', 'LPROT002_NPC.TAK_6H:O12', 'LPROT002_NPC.TAK_6H:O08', 'LJP008_HT29_24H:G08', 'LJP008_ASC_24H:G12', 'LJP008_HA1E_24H:G10', 'LJP007_HA1E_24H:A03', 'LJP008_HEPG2_24H:A03', 'LPROT001_A375_6H:P11', 'LPROT001_NPC.TAK_6H:O10', 'LJP009_NEU_24H:A03', 'LJP007_MNEU.E_24H:A03', 'LJP007_SKL.C_24H:A03', 'LJP008_SKL_24H:G10', 'LJP008_NPC.CAS9_24H:G12', 'LJP008_HEPG2_24H:G11', 'LJP008_CD34_24H:G09', 'LJP008_HCC515_24H:A03', 'LJP008_PC3_24H:G11', 'LPROT001_MCF7_6H:O11', 'LJP008_ASC_24H:G09', 'LJP008_NPC.CAS9_24H:G10', 'LJP008_MCF7_24H:G10', 'LJP008_CD34_24H:A03', 'LJP008_HA1E_24H:A03', 'LJP008_SKL_24H:G08', 'LJP008_HME1_24H:G10', 'LJP007_A375_24H:A03', 'LJP008_A375_24H:G08', 'LJP008_ASC.C_24H:A03', 'LJP008_NEU_24H:G11', 'LJP008_HME1_24H:G07', 'LJP008_HEPG2_24H:G10', 'LJP008_HT29_24H:G10', 'LJP008_ASC.C_24H:G11', 'LJP009_HME1_24H:A03', 'LJP008_HA1E_24H:G09', 'LPROT002_NPC.TAK_6H:O10', 'LPROT003_A549_6H:O10', 'LJP007_NPC_24H:A03', 'LPROT003_PC3_6H:O07', 'LJP008_SKL.C_24H:G10', 'LPROT002_MCF7_6H:P08', 'LJP007_CD34_24H:A03', 'LPROT003_NPC_6H:P11', 'LPROT002_MCF7_6H:P10', 'LJP008_PC3_24H:A03', 'LJP008_HUVEC_24H:G07', 'LPROT003_NPC_6H:P09', 'LJP008_HME1_24H:A03', 'LJP008_NPC.TAK_24H:G10', 'LJP008_MCF7_24H:G07', 'LJP008_HME1_24H:G09', 'LJP009_ASC_24H:A03', 'LJP008_ASC_24H:G08', 'LJP008_HT29_24H:G09', 'LJP007_HT29_24H:A03', 'LJP009_HUVEC_24H:A03', 'LPROT002_A549_6H:O09', 'LJP008_SKL_24H:G07', 'LJP008_NPC_24H:G09', 'LJP008_HCC515_24H:G12', 'LJP008_A549_24H:G09', 'LJP008_A549_24H:G08', 'LJP008_HEPG2_24H:G12', 'LPROT002_MCF7_6H:P12', 'LJP008_HA1E_24H:G07', 'LJP008_HUVEC_24H:G12', 'LJP008_NPC.CAS9_24H:G11', 'LPROT003_A375_6H:P12', 'LJP008_NPC.TAK_24H:A03', 'LPROT003_A549_6H:O12', 'LJP007_HUES3_24H:A03', 'LPROT003_NPC_6H:P07', 'LJP008_ASC.C_24H:G08', 'LPROT001_A375_6H:P07', 'LJP008_HCC515_24H:G09', 'LJP009_HT29_24H:A03', 'LJP008_HT29_24H:G11', 'LJP009_HEPG2_24H:A03', 'LJP008_SKL.C_24H:G11', 'LJP008_A549_24H:G10', 'LJP008_ASC_24H:A03', 'LJP008_A549_24H:A03', 'LJP008_A375_24H:G10', 'LPROT001_NPC.TAK_6H:O12', 'LJP008_MCF7_24H:G08', 'LPROT002_A549_6H:O07', 'LPROT003_A549_6H:O08', 'LJP008_CD34_24H:G07', 'LPROT003_PC3_6H:O09', 'LJP007_SKL_24H:A03', 'LPROT001_PC3_6H:P08', 'LJP008_A375_24H:G09', 'LJP008_HT29_24H:A03', 'LJP008_ASC_24H:G07', 'LJP007_HUVEC_24H:A03', 'LJP008_HUVEC_24H:G10', 'LJP008_HCC515_24H:G10', 'LJP008_ASC_24H:G10', 'LPROT003_PC3_6H:O11', 'LJP008_HT29_24H:G07', 'LJP008_SKL.C_24H:G12', 'LJP008_NPC.CAS9_24H:G09', 'LJP008_MCF7_24H:A03', 'LJP007_HME1_24H:A03', 'LJP007_NPC.CAS9_24H:A03', 'LJP008_HA1E_24H:G12', 'LPROT002_A549_6H:O11', 'LJP007_ASC_24H:A03', 'LJP008_NPC.TAK_24H:G12', 'LJP009_ASC.C_24H:A03', 'LJP008_HEPG2_24H:G09', 'LJP008_NEU_24H:G07', 'LJP008_NPC_24H:G08', 'LPROT001_MCF7_6H:O09', 'LPROT003_A375_6H:P10', 'LPROT003_A375_6H:P08', 'LJP008_CD34_24H:G11', 'LJP009_PC3_24H:A03', 'LJP008_CD34_24H:G12', 'LJP008_A375_24H:G12', 'LJP009_HA1E_24H:A03', 'LJP007_A549_24H:A03', 'LPROT002_A375_6H:P11', 'LJP008_A375_24H:G11', 'LJP007_NPC.TAK_24H:A03', 'LJP008_HT29_24H:G12', 'LJP008_NPC_24H:A03', 'LJP009_NPC_24H:A03', 'LJP008_SKL.C_24H:G07', 'LJP008_HME1_24H:G12', 'LJP009_SKL.C_24H:A03', 'LJP008_NPC_24H:G11', 'LJP008_CD34_24H:G08', 'LJP009_NPC.CAS9_24H:A03', 'LJP008_PC3_24H:G12', 'LJP008_MCF7_24H:G11', 'LJP008_PC3_24H:G10', 'LJP008_ASC.C_24H:G12', 'LPROT001_PC3_6H:P10', 'LJP007_MCF7_24H:A03', 'LJP008_HCC515_24H:G11', 'LJP008_HUVEC_24H:A03', 'LJP009_HCC515_24H:A03', 'LJP007_HEPG2_24H:A03', 'LJP009_A549_24H:A03', 'LJP008_A549_24H:G07', 'LJP008_HA1E_24H:G11', 'LJP008_PC3_24H:G08', 'LJP008_ASC.C_24H:G09', 'LJP008_SKL.C_24H:G08', 'LJP008_SKL_24H:A03', 'LJP009_A375_24H:A03', 'LJP008_CD34_24H:G10', 'LJP007_JURKAT_24H:A03', 'LJP008_MCF7_24H:G12', 'LJP008_HEPG2_24H:G07', 'LJP008_NPC.TAK_24H:G07', 'LJP007_ASC.C_24H:A03', 'LJP008_SKL_24H:G09', 'LPROT002_A375_6H:P09', 'LPROT001_MCF7_6H:O07', 'LJP008_A549_24H:G11', 'LJP009_SKL_24H:A03', 'LJP008_HME1_24H:G08', 'LJP008_HUVEC_24H:G09', 'LJP008_HME1_24H:G11', 'LJP008_SKL_24H:G11', 'LJP009_MCF7_24H:A03', 'LJP009_NPC.TAK_24H:A03', 'LJP008_SKL.C_24H:G09', 'LJP008_PC3_24H:G07', 'LJP008_HCC515_24H:G08', 'LJP008_NPC.CAS9_24H:G07', 'LJP008_NPC.TAK_24H:G09', 'LPROT001_A375_6H:P09', 'LJP007_NEU_24H:A03', 'LJP008_MCF7_24H:G09', 'LJP008_NPC_24H:G12', 'LJP008_NEU_24H:A03', 'LJP008_NPC.CAS9_24H:A03', 'LJP008_HUVEC_24H:G11', 'LJP008_NPC.CAS9_24H:G08', 'LJP008_HCC515_24H:G07', 'LJP008_NEU_24H:G10', 'LJP008_NEU_24H:G08', 'LJP008_A549_24H:G12', 'LJP008_NPC_24H:G10', 'LJP008_HUVEC_24H:G08', 'LJP008_NPC_24H:G07', 'LPROT002_A375_6H:P07', 'LJP007_PC3_24H:A03', 'LJP008_SKL.C_24H:A03', 'LJP008_HA1E_24H:G08', 'LJP008_ASC_24H:G11'}```
oena commented 5 years ago

Hi @maxdrohde, it looks like you're using Python 3 to do this. Please note that the package isn't currently supported in Python 3 at the moment; we recommend setting up a conda environment with Python 2.7 instead.