cmap / cmapPy

Assorted tools for interacting with .gct, .gctx files and other Connectivity Map (Broad Institute) data/tools
https://clue.io/cmapPy/index.html
BSD 3-Clause "New" or "Revised" License
124 stars 74 forks source link

Fix GCT file parsing in Python 3 #46

Closed JDRomano2 closed 6 years ago

JDRomano2 commented 6 years ago

When trying to parse a GCT file using cmapPy.pandasGEXpress.parse.parse() in Python 3, the following error raises:

In [2]: x = parse('./ps_A375_n8471x8471.gct')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-9ac756de3f82> in <module>()
----> 1 x = parse('./ps_A375_n8471x8471.gct')

~/developer/python/cmappy/cmapPy/pandasGEXpress/parse.py in parse(file_path, convert_neg_666, rid, cid, ridx, cidx, row_meta_only, col_meta_only, make_multiindex)
     60                               rid=rid, cid=cid, ridx=ridx, cidx=cidx,
     61                               row_meta_only=row_meta_only, col_meta_only=col_meta_only,
---> 62                               make_multiindex=make_multiindex)
     63
     64     elif file_path.endswith(".gctx"):

~/developer/python/cmappy/cmapPy/pandasGEXpress/parse_gct.py in parse(file_path, convert_neg_666, rid, cid, ridx, cidx, row_meta_only, col_meta_only, make_multiindex)
    134     # Read version and dimensions
    135     (version, num_data_rows, num_data_cols,
--> 136      num_row_metadata, num_col_metadata) = read_version_and_dims(file_path)
    137
    138     # Read in metadata and data

~/developer/python/cmappy/cmapPy/pandasGEXpress/parse_gct.py in read_version_and_dims(file_path)
    166
    167     # Get version from the first line
--> 168     version = f.readline().strip().lstrip("#")
    169
    170     if version not in ["1.3", "1.2"]:

TypeError: a bytes-like object is required, not 'str'

In Python 3, this error occurs when a file is read as binary (e.g., open('foo.bar', "rb") when it shouldn't be. This patch resolves the issue by testing the python version and using the correct flag within open().

oena commented 6 years ago

Hi @JDRomano2, thanks for your contribution! However, in order to accept your PR and merge to master the test suite config would actually need to be modified to also run all of our tests using Python 3 in addition to Python 2 (which--if you look at the config on Travis CI--it currently isn't); this would require some additional changes.

JDRomano2 commented 6 years ago

Unfortunately, there seem to be other Python 3 compatibility issues that prevent the test suite from passing. I'm going to close the PR and work on Python 3 compatibility on a new local branch (so I don't pollute the commit log on master), and resubmit as a new PR later once everything is running properly.

I've also noticed that #24 contains discussion about outstanding issues with Python 3, so I'll make sure that I'm not just reinventing a broken wheel 😄.

Thanks for your responsiveness!

oena commented 6 years ago

Sounds good, thanks @JDRomano2!