fhcrc / seqmagick

An imagemagick-like frontend to Biopython SeqIO
http://seqmagick.readthedocs.org
GNU General Public License v3.0
113 stars 22 forks source link

--sample N causes error if number of seqs < N #35

Closed nhoffman closed 11 years ago

nhoffman commented 11 years ago
% cat > seqs.fasta
>one
AGAGAGAGAGAGAGGA
% seqmagick convert --sample 2 seqs.fasta sampled.fasta
Traceback (most recent call last):
  File "/usr/local/bin/seqmagick", line 9, in <module>
    load_entry_point('seqmagick==0.5.0', 'console_scripts', 'seqmagick')()
  File "/usr/local/lib/python2.7/site-packages/seqmagick/scripts/cli.py", line 29, in main
    return action(arguments)
  File "/usr/local/lib/python2.7/site-packages/seqmagick/subcommands/convert.py", line 341, in action
    transform_file(src, dest, arguments)
  File "/usr/local/lib/python2.7/site-packages/seqmagick/subcommands/convert.py", line 273, in transform_file
    records = function(records)
  File "/usr/local/lib/python2.7/site-packages/seqmagick/transform.py", line 471, in sample
    return random.sample(list(records), sample)
  File "/usr/local/lib/python2.7/random.py", line 320, in sample
    raise ValueError("sample larger than population")
ValueError: sample larger than population

This is a problem, for example, when iterating over multiple sequence files: it's nice to be able to define a fixed value for --sample without worrying if one or more of the files doesn't have at least that many sequences.

cmccoy commented 11 years ago

This should have been fixed by 398b949b4dfd75464654bae9917c8863f7534410. Would you mind testing against master?

On Thu, Nov 7, 2013 at 2:11 PM, Noah Hoffman notifications@github.comwrote:

% cat > seqs.fasta

one AGAGAGAGAGAGAGGA % seqmagick convert --sample 2 seqs.fasta sampled.fasta Traceback (most recent call last): File "/usr/local/bin/seqmagick", line 9, in load_entry_point('seqmagick==0.5.0', 'console_scripts', 'seqmagick')() File "/usr/local/lib/python2.7/site-packages/seqmagick/scripts/cli.py", line 29, in main return action(arguments) File "/usr/local/lib/python2.7/site-packages/seqmagick/subcommands/convert.py", line 341, in action transform_file(src, dest, arguments) File "/usr/local/lib/python2.7/site-packages/seqmagick/subcommands/convert.py", line 273, in transform_file records = function(records) File "/usr/local/lib/python2.7/site-packages/seqmagick/transform.py", line 471, in sample return random.sample(list(records), sample) File "/usr/local/lib/python2.7/random.py", line 320, in sample raise ValueError("sample larger than population") ValueError: sample larger than population

This is a problem, for example, when iterating over multiple sequence files: it's nice to be able to define a fixed value for --sample without worrying if one or more of the files doesn't have at least that many sequences.

— Reply to this email directly or view it on GitHubhttps://github.com/fhcrc/seqmagick/issues/35 .

Connor McCoy Fred Hutchinson Cancer Research Center 1100 Fairview Ave N. Seattle, WA 98109-1924 cmccoy@fhcrc.org

nhoffman commented 11 years ago

Yup, it's fixed in master. Should have specified that I was using v0.5.0

bunnyhutch commented 11 years ago

Noah, no offense, but the score is

@cmccoy: 15 everyone else: 0

On Thu, Nov 7, 2013 at 3:40 PM, Connor McCoy notifications@github.comwrote:

Closed #35 https://github.com/fhcrc/seqmagick/issues/35.

— Reply to this email directly or view it on GitHubhttps://github.com/fhcrc/seqmagick/issues/35 .

Frederick "Erick" Matsen, Assistant Member Fred Hutchinson Cancer Research Center http://matsen.fhcrc.org/

cmccoy commented 11 years ago

Ha - you might add: https://github.com/fhcrc/deenurp/pull/5 https://github.com/fhcrc/seqmagick/pull/32 etc. to your calculations.

On Thu, Nov 7, 2013 at 3:48 PM, bunnyhutch notifications@github.com wrote:

Noah, no offense, but the score is

@cmccoy: 15 everyone else: 0

On Thu, Nov 7, 2013 at 3:40 PM, Connor McCoy notifications@github.comwrote:

Closed #35 https://github.com/fhcrc/seqmagick/issues/35.

— Reply to this email directly or view it on GitHub< https://github.com/fhcrc/seqmagick/issues/35> .

Frederick "Erick" Matsen, Assistant Member Fred Hutchinson Cancer Research Center http://matsen.fhcrc.org/

— Reply to this email directly or view it on GitHubhttps://github.com/fhcrc/seqmagick/issues/35#issuecomment-28018863 .

Connor McCoy Fred Hutchinson Cancer Research Center 1100 Fairview Ave N. Seattle, WA 98109-1924 cmccoy@fhcrc.org