Closed tstoeger closed 6 years ago
Overlooked the need for a very specific Python 2.7 environment (outlined in https://clue.io/cmapPy/build.html#install - and exceeding the information provided in readme - and being inconsistent with tutorial by leading to the setup of a cmappy version that would require parse.parse()
instead of parse()
).
To add to confusion the file names had changed between the tutorial and the public version of GSE70138 (which could have opened the possibility for a change of the file format ..).
Hi @tstoeger, sorry you had difficulties in using the tutorial. If you have suggestions as to how to make installation instructions more clear, feel free to let us know; the README currently links out to ReadTheDocs in order to help us keep documentation in a centralized place and (hopefully) up to date.
Regarding the tutorial, I'll update the inconsistencies regarding use of parse
methods. With regard to scope, we definitely hope to add more tutorials in the future, but for the time being only have one with GEO data because we guessed that would be the most common use case for the package. Just for the record--should you want to investigate error messages/bugs without dealing with external datasets in the future--we do already have a variety of files used for testing to disambiguate code vs. file issues; these are located in cmapPy/cmapPy/pandasGEXpress/tests/functional_tests.
Hi @oena ; Let me thank you at first - both for your inquiry, and the already existing documentation of cmapPY, which already has been very useful. Indeed the tutorial is a very nice extra.
My troubles had arisen from running into slightly different problems, and noticing that at least three distinct aspects seemed to have changed (version of used dataset, something related to external Python code, something related to cmapPY); As I'd take the tutorial as reference, this would hint at me overlooking something - but also not knowing for sure, which aspect I should trust or follow.
Possibly, the tutorial could:
Those points all seem very reasonable to me, thanks! I'll see what we can do to address them better than we do currently.
Hi, Although I'm using Python version 2.7, I get the error "Exception: parse_gctx check_id_validity" that you received above, but not the metadata for the file being parsed - mismatch_ids: ... The file I'm trying to run is GSE92742. I would appreciate it if you could tell me how you solved the above problem.
I made a Python 3 compatible version of cmapPy; Credits for identifying critical section go to @heltena
In my usage scenario a single line addition was sufficient.
curr_dset.read_direct(temp_array)
temp_array = np.core.defchararray.decode(temp_array, 'utf8') # <- introduced for Python3 compatibility
header_values[str(k)] = temp_array
My usage scenario was restricted to gctx files, which simplifies the problem of Python 3 compatibility. I didn’t check definition of gctx regarding future compatibility of encoding.I have only constructed tests with GSE92742 level 5, and I additionally bypassed GCToo instances as output I have always been only using the data frame contained within them (hence, I did not check their creation for compatibility with Python3). The above covers my usage of cmapPy.
Hi @benanbardak It would be helpful if you can mention which file you are using from GEO to read in the metadata. There are five files given here
Firstly thank you for response, I am using "GSE92742_Broad_LINCS_Level3_INF_mlr12k_n1319138x12328.gctx.gz". But I get an error "Exception: parse_gctx check_id_validity some of the ids being used to subset the data are not present in the metadata for the file being parsed - mismatch_ids:.."
That is a 48 GB file so I will take some time to try to download it. I tried it with another file from the same series "GSE92742_Broad_LINCS_Level2_GEX_delta_n49216x978.gctx.gz" and metadata parsing is working in python2. If you can try it with this file, and it fails then the issue might be with your version of cmapPy. If it does not fail with this smaller file, it might be the case that the 48gb file has something different going on that the package is not able to handle
And please check your email.. @tstoeger
@saksham219 I tried to run tutorial with this data "GSE92742_Broad_LINCS_Level2_GEX_delta_n49216x978.gctx.gz". But again I get an same error. How can I solve this problem? What does mean "the issue might be with your version of cmapPy. " How can I fixed version of cmapPy? Thank you so much.
@benanbardak What I mean is that you might not be using the latest version on the master branch of this repo. you can try running this from the terminal
$ git clone https://github.com/cmap/cmapPy
$ pip install cmapPy/
and then trying to read the file again in a new python environment.
If the problem still persists, it would be helpful if you could list down the versions of the packages in your python by
$ pip freeze
Following the tutorial cmapPy_pandasGEXpress_tutorial.ipynb currently (2018-March-03) yields an error.
Since it uses an external data set GEO GSE70138 (rather than a test contained within cmapPy) it isn't clear, if this error reflects upon an update or problem within cmapPy, the tutorial, or GSE70138. (Besides not being able to follow a tutorial, this error hence makes it difficult for new users to become familiar with gctx files / cmapPy.)
works: upper part of tutorial
number of samples treated with vorinostat: 210
---- show first ones for debugging ---- LJP007_A375_24H:A03 LJP007_A549_24H:A03 LJP007_ASC.C_24H:A03 LJP007_ASC_24H:A03 LJP007_CD34_24H:A03
creates error: loading of records