common-workflow-library / legacy

Deprecated
https://github.com/common-workflow-library/bio-cwl-tools
Apache License 2.0
100 stars 62 forks source link

[WIP] CNVkit tool definitions #93

Open anton-khodak opened 8 years ago

anton-khodak commented 8 years ago

Standing PR to add tool descriptions (created by argparse2cwl) and tests for CNVkit tools .

Issues I encountered on first steps:

$ cnvkit.py batch --processes 1 
--normal test-files/s5DE199B-D6AF-C6EC-678A-DEC1179D1B97.fastq 
--fasta test-files/cnvkit-batch/ERCC92.fa 
--targets test-files/InfiniumPsychArray-24v1-1_A1.bed 
-annotate test-files/cnvkit-batch/refFlat.txt 
--split --access test-files/InfiniumPsychArray-24v1-1_A1.bed 
--output-dir . --scatter --diagram
Detected file format: BED
Applying annotations as target names
Splitting large targets
Traceback (most recent call last):
  File "/usr/local/bin/cnvkit.py", line 11, in <module>
    args.func(args)
  File "/usr/local/lib/python3.4/dist-packages/cnvlib/commands.py", line 96, in _cmd_batch
    args.processes, args.count_reads)
  File "/usr/local/lib/python3.4/dist-packages/cnvlib/commands.py", line 138, in batch_make_reference
    else {}))
  File "/usr/local/lib/python3.4/dist-packages/cnvlib/commands.py", line 327, in do_targets
    ['chromosome', 'start', 'end', 'name'])
  File "/usr/local/lib/python3.4/dist-packages/cnvlib/gary.py", line 66, in from_rows
    table = pd.DataFrame.from_records(rows, columns=columns)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py", line 939, in from_records
    first_row = next(data)
  File "/usr/local/lib/python3.4/dist-packages/cnvlib/target.py", line 287, in split_targets
    for chrom, start, end, name in region_rows:
  File "/usr/local/lib/python3.4/dist-packages/cnvlib/target.py", line 21, in assign_names
    ref_genes = read_refflat_genes(refflat_fname)
  File "/usr/local/lib/python3.4/dist-packages/cnvlib/target.py", line 80, in read_refflat_genes
    name, _rx, chrom, strand, start, end, _ex = parse_refflat_line(line)
  File "/usr/local/lib/python3.4/dist-packages/cnvlib/target.py", line 133, in parse_refflat_line
    assert len(exons) == int(exon_count), (
TypeError: object of type 'zip' has no len()

I think this error might be caused by irrelevant data.

Also, I couldn't find copy number reference profile sample files (.cnn) at all. If somebody who uses CNVkit frequently could give me a hint where to take proper data, my work in testing would have been much facilitated.

mr-c commented 8 years ago

@anton-khodak Did you look at https://travis-ci.org/common-workflow-language/workflows/builds/134234968 ?

mr-c commented 8 years ago

I think it is fine to just check in the generated descriptions; don't worry about writing a specific test. As long as the generated output parses, that is good enough for now.

brainstorm commented 8 years ago

I'm totally with @mr-c, we should focus first on CWL, not on specific tools since the amount of work can be quite substantial. If you want to see whether one of the CNVkit subtools works it's fine to dedicate some focused effort, but by no means aiming to cover the whole suite of tools.

Hope that makes sense ;)

brainstorm commented 8 years ago

OTOH, for a good example on how to test different tools (in my case SV callers), MetaSV has it quite well wrapped up:

https://github.com/bioinform/metasv

But this is just an example, don't spend too much time looking through it.

anton-khodak commented 8 years ago

@brainstorm , that's great! I misinterpreted the goal of the PR, it was not to pass Travis checks but to merely validate those tools. In that case, I'll fix the job file (@mr-c pointed indirectly on that issue) and push all other tools.

UPD. I should have looked more closely at test/cwltest.py... Travis CI checks the mere validity of tools, not how they are executed (with or without errors).

etal commented 8 years ago

Hi guys, I'm happy to help with testing CNVkit and/or tweaking the test suite to play better with argparse2cwl. You can skip wrapping anything marked "deprecated" (e.g. loh, genome2access), those parts will be removed in the next release. Just let me know anything else you need.

brainstorm commented 8 years ago

@etal, very happy to have you help Anton with that. I was looking at the outputs generated by argparse2cwl yesterday but since I never used CNVkit before, I'm missing a few bits of domain expertise there, so help is super welcome, thanks!

etal commented 7 years ago

I've released a new minor version of CNVkit that drops the deprecated parts and introduces a few new options. I think the current CWL wrappers in Anton's repo should still work, but batch has a new --method option that's worth exposing. Let me know if there's anything else I can do to help complete and maintain these wrappers.