cmap / cmapPy

Assorted tools for interacting with .gct, .gctx files and other Connectivity Map (Broad Institute) data/tools
https://clue.io/cmapPy/index.html
BSD 3-Clause "New" or "Revised" License
124 stars 74 forks source link

reading gctx from a non-fs file object #64

Open idavydov opened 4 years ago

idavydov commented 4 years ago

Hi, In our settings there is often a need to read a .gctx file from a non-file (i.e. python file object). Currently it is not possible with cmapPy. The parse method explicitly checks for the filename:

https://github.com/cmap/cmapPy/blob/f3fdf016095bb08d9402ec9b6d3ebf6e603d20a1/cmapPy/pandasGEXpress/parse_gctx.py#L64

On the other hand h5py supports any file objects.

Would it be possible to rely on duck-typing in the parse function instead to allow for different types of input file objects?

idavydov commented 4 years ago

Apparently, there are problems reading HDF5 from S3 at the moment. See here: https://github.com/h5py/h5py/issues/1530

So currently depends on the upstream functionality of h5py.