Closed levlitichev closed 6 years ago
@levlitichev mostly looks good! Minor bug fix + some comments:
Bugs:
Naming:
Purpose:
Let me know what you think!
I can't comment on it in "files changed", but slice_gct imports parse_gct twice (see: line 18); line 18 should be write_gctx, right? Related: not exactly sure about the best way to test this behavior (which might be good to add? #lahr) beyond trying to parse it back in with parse_gct/x specifically (it's a problem also relevant to outstanding issues re: adding testing for gct2gctx & gctx2gct), suggestions welcome.
Good catch, and I agree it's annoying to catch. At least for this test, I'll make it write and read a GCTx specifically. It's a start!
I might change the "--use_gctx" flag to "--outputgctx"; I believe we've generally used use* for inputs, not outputs.
I think I got --use_gctx from sigtools (e.g. introspect), but I like your suggestion. Dave used --out_type in concat_gctoo. What do you think about using that?
parser.add_argument("--out_type", "-ot", default="gctx", choices=["gct", "gctx"], help="whether to save output as a gct or gctx")
I don't currently think that slice_gct should accept gctx files. I think the name implies that it's a tool for GCTs, and while I think it's useful for GCTs GCTx files have hyperslab selection already--which really only requires parsing. The one case I can think of where it might be useful in a GCTX context is as a command-line tool, but if that's sufficiently valuable to keep GCTX as inputs then I think we should think a bit more about the current naming ambiguity.
This is a good point. Let me think a little more about this, and maybe we can also catch up in-person.
Thanks for the comments!
Sounds good! I like --out_type b/c it's extensible in the future too; I think adding that with a pre-defined list of acceptable output options is a good idea.
I think that addition was for some init.py things that we had been working on: https://github.com/cmap/cmapPy/commit/5c933fc46ec6ba262e88da1187dc9fbbb058b8b5#diff-f60cfdea092de29be61d0460472b2961. I think safe to axe.
I think you meant ^ comment on a different PR? Seems reasonable regardless, I remember now that we considered making write_* into a generic but decided against it
Oops. Yes, good call!
@dllahr: The second to last commit is a big one, so it'd be good to get your thoughts.
We're proposing to rename concat_gctoo.py
to concat.py
, slice_gct.py
to subset.py
, and slice_gctoo.py
to subset_gctoo.py
.
The reason is to stay consistent with parse.py
: if a script works on either GCT or GCTx, the file name shouldn't have GCT or GCTx in the name. We had to change slice to subset because slice is a built-in Python thing. If a script works on a GCToo object, then it has gctoo in the name (i.e. subset_gctoo
). However, concat.py
(formerly, concat_gctoo.py
) works on GCT and GCTx files: thus the name change.
I think I've fixed references to these 3 scripts everywhere, but I'm sure I might have missed something. We've also added deprecation warnings to the old files so that it's clear what the new syntax is in.
Let us know what you think.
@levlitichev that naming logic seems solid to me! Also I like main calling another main (e.g. slice_main) for testing. I'll start checking the code.
All good by me!
ORIGINAL
UPDATE (April 2, 2018)
concat_gctoo.py
toconcat.py
,slice_gct.py
tosubset.py
, andslice_gctoo.py
tosubset_gctoo.py
for consistent file namingslice_main
orconcat_main
)