cmap / cmapPy

Assorted tools for interacting with .gct, .gctx files and other Connectivity Map (Broad Institute) data/tools
https://clue.io/cmapPy/index.html
BSD 3-Clause "New" or "Revised" License
124 stars 74 forks source link

parse.py to provide exactly the same arguments to parse_gct as to parse_gctx #31

Closed levlitichev closed 6 years ago

levlitichev commented 6 years ago

ORIGINAL

UPDATE (April 2, 2018)

oena commented 6 years ago

@levlitichev mostly looks good! Minor bug fix + some comments:

Bugs:

Naming:

Purpose:

Let me know what you think!

levlitichev commented 6 years ago

I can't comment on it in "files changed", but slice_gct imports parse_gct twice (see: line 18); line 18 should be write_gctx, right? Related: not exactly sure about the best way to test this behavior (which might be good to add? #lahr) beyond trying to parse it back in with parse_gct/x specifically (it's a problem also relevant to outstanding issues re: adding testing for gct2gctx & gctx2gct), suggestions welcome.

Good catch, and I agree it's annoying to catch. At least for this test, I'll make it write and read a GCTx specifically. It's a start!

I might change the "--use_gctx" flag to "--outputgctx"; I believe we've generally used use* for inputs, not outputs.

I think I got --use_gctx from sigtools (e.g. introspect), but I like your suggestion. Dave used --out_type in concat_gctoo. What do you think about using that?

parser.add_argument("--out_type", "-ot", default="gctx", choices=["gct", "gctx"], help="whether to save output as a gct or gctx")

I don't currently think that slice_gct should accept gctx files. I think the name implies that it's a tool for GCTs, and while I think it's useful for GCTs GCTx files have hyperslab selection already--which really only requires parsing. The one case I can think of where it might be useful in a GCTX context is as a command-line tool, but if that's sufficiently valuable to keep GCTX as inputs then I think we should think a bit more about the current naming ambiguity.

This is a good point. Let me think a little more about this, and maybe we can also catch up in-person.

Thanks for the comments!

oena commented 6 years ago

Sounds good! I like --out_type b/c it's extensible in the future too; I think adding that with a pre-defined list of acceptable output options is a good idea.

levlitichev commented 6 years ago

I think that addition was for some init.py things that we had been working on: https://github.com/cmap/cmapPy/commit/5c933fc46ec6ba262e88da1187dc9fbbb058b8b5#diff-f60cfdea092de29be61d0460472b2961. I think safe to axe.

oena commented 6 years ago

I think you meant ^ comment on a different PR? Seems reasonable regardless, I remember now that we considered making write_* into a generic but decided against it

levlitichev commented 6 years ago

Oops. Yes, good call!

levlitichev commented 6 years ago

@dllahr: The second to last commit is a big one, so it'd be good to get your thoughts.

We're proposing to rename concat_gctoo.py to concat.py, slice_gct.py to subset.py, and slice_gctoo.py to subset_gctoo.py.

The reason is to stay consistent with parse.py: if a script works on either GCT or GCTx, the file name shouldn't have GCT or GCTx in the name. We had to change slice to subset because slice is a built-in Python thing. If a script works on a GCToo object, then it has gctoo in the name (i.e. subset_gctoo). However, concat.py (formerly, concat_gctoo.py) works on GCT and GCTx files: thus the name change.

I think I've fixed references to these 3 scripts everywhere, but I'm sure I might have missed something. We've also added deprecation warnings to the old files so that it's clear what the new syntax is in.

Let us know what you think.

dllahr commented 6 years ago

@levlitichev that naming logic seems solid to me! Also I like main calling another main (e.g. slice_main) for testing. I'll start checking the code.

dllahr commented 6 years ago

All good by me!