christophertbrown / iRep

scripts for estimating bacteria replication rates based on population genome copy number variation
MIT License
68 stars 9 forks source link

sam files with spaces in sequence IDs #2

Closed ianpgm closed 7 years ago

ianpgm commented 8 years ago

I was getting the following error when trying to run this program:

# parsing mapping files
Traceback (most recent call last):
  File "./iRep.py", line 966, in <module>
    thresholds, args['t'])
  File "./iRep.py", line 786, in iRep
    genomes = calc_coverage(genomes, mappings, id2g)
  File "./iRep.py", line 157, in calc_coverage
    for read in reads:
  File "./iRep.py", line 879, in filter_mapping
    'both', sort_sam, False, False, sbuffer):
  File "/iRep/bin/mapped.py", line 180, in get_reads
    for read in reads_from_mapping(mapping, contigs, mismatches, mm_option, req_map, region):
  File "/iRep/bin/mapped.py", line 125, in reads_from_mapping
    if int(line[1]) <= 20: # is this from a single read?
ValueError: invalid literal for int() with base 10: '1:N:0:TAAGGCGATAGATCGC'

It looked like it was because the ID field in my sam files had a space and therefore wasn't being parsed correctly by mapped.py. I managed to fix this by modifying line 113 of mapped.py to split based on tabs rather than spaces: line = line.strip().split("\t")

That shouldn't mess anything else up, should it?

christophertbrown commented 8 years ago

Hey Ian,

That should be fine!

I will add that change to my next update, which I will hopefully release in the next few days. That update will also include a correction for GC sequencing bias, which is an issue with some sequencing libraries. Just FYI, it is probably worth checking out.

Thanks for the info on mapped.py!

Chris

On Aug 31, 2016, at 1:38 AM, Ian Marshall notifications@github.com wrote:

I was getting the following error when trying to run this program:

parsing mapping files

Traceback (most recent call last): File "./iRep.py", line 966, in thresholds, args['t']) File "./iRep.py", line 786, in iRep genomes = calc_coverage(genomes, mappings, id2g) File "./iRep.py", line 157, in calc_coverage for read in reads: File "./iRep.py", line 879, in filter_mapping 'both', sort_sam, False, False, sbuffer): File "/iRep/bin/mapped.py", line 180, in get_reads for read in reads_from_mapping(mapping, contigs, mismatches, mm_option, req_map, region): File "/iRep/bin/mapped.py", line 125, in reads_from_mapping if int(line[1]) <= 20: # is this from a single read? ValueError: invalid literal for int() with base 10: '1:N:0:TAAGGCGATAGATCGC' It looked like it was because the ID field in my sam files had a space and therefore wasn't being parsed correctly by mapped.py. I managed to fix this by modifying line 113 of mapped.py to split based on tabs rather than spaces: line = line.strip().split("\t")

That shouldn't mess anything else up, should it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/christophertbrown/iRep/issues/2, or mute the thread https://github.com/notifications/unsubscribe-auth/AKWzaUrHBalTe4xAOp5m1LcLkZgWbnByks5qlT2NgaJpZM4JxXWM.