christophertbrown / iRep

scripts for estimating bacteria replication rates based on population genome copy number variation
MIT License
68 stars 9 forks source link

Error thrown by parse_genomes_fa #26

Open defleury opened 5 years ago

defleury commented 5 years ago

I am running iRep on a large dataset, using the piping approach suggested by @nigiord in #1 , like so:

samtools view [sample].sam [contig_1 contig_2 etc.] | iRep -f *.fna -s - -o [sample] -t 4

My version is the latest one downloaded from bioconda, I assume 1.1.7 (not sure how to get the actual version number from the tool though...).

When running, I first get a warning (no clue if that's relevant):

miniconda3/envs/irep-env/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)

The program then crashes with this error:

Traceback (most recent call last):
  File "miniconda3/envs/irep-env/bin/iRep", line 73, in <module>
    thresholds, args['no_gc_correction'], args['t'])
  File "miniconda3/envs/irep-env/lib/python3.6/site-packages/iRep/iRep.py", line 976, in iRep
    genomes, id2g = parse_genomes_fa(fastas, mappings)
  File "miniconda3/envs/irep-env/lib/python3.6/site-packages/iRep/iRep.py", line 917, in parse_genomes_fa
    ID = seq[0].split('>', 1)[1].split()[0]
AttributeError: 'list' object has no attribute 'split'

I process ~300 genomes in total, and tried running for ~950 samples (the test run hadn't crashed after some time, so I confidently submitted all jobs...). All of them crash, none provide output. Is something wrong with my input genomes? I indexed the samfiles, so I assume it's not about a weird stream sent by samtools.

Thanks & best,

Sebastian

defleury commented 5 years ago

I consulted our local Python expert on this who thinks that the issue actually lies with a dependency:

https://github.com/christophertbrown/bioscripts/blob/master/ctbBio/fasta.py#L24