ekawaler / pyQUILTS

Rebuilding QUILTS in Python.
9 stars 9 forks source link

Multiple occurrances of ValueError: invalid literal for int() with base 10: '' #7

Closed apuhegde closed 6 years ago

apuhegde commented 6 years ago

I get a couple of similar errors, the second one after debugging the first.

Error #1: Traceback (most recent call last): File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 2228, in <module> get_variants(args.germline+"/merged_pytest/merged.vcf", results_folder+"/log/proteome.bed", "G") File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 396, in get_variants est.add_exon(chr, int(spoffsets[i])+start, int(spoffsets[i])+start+int(splengths[i])-1, total_exon_length, name) ValueError: invalid literal for int() with base 10: ''

I debug this by adding the following lines after line 379 in the first try statement within _"getvariants": `lengths = lengths.rstrip(",")

offsets = offsets.rstrip(",")`

When I run the debugged code, I get error #2 : Traceback (most recent call last): File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 2255, in <module> translate(results_folder+"/log/", "proteome.aa.var.bed.dna", logfile, 'aa') File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 980, in translate out_fasta.write(format_header(second_header, seq_type, abbr[gene], desc[gene])) File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 924, in format_header return format_aa_header(orig_header,abbr,desc) File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 859, in format_aa_header SNP = calculate_chr_pos(orig_header.split(':')[2]) File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 838, in calculate_chr_pos lengths = map(int,map_section.split()[1].split(',')) ValueError: invalid literal for int() with base 10: ''

This one I'm not able to debug. I don't know if this could be a cause but I'm trying the uniprot version of the "prepare_proteome" section, which has the same problem every UCSC downloaded file seems to have - trailing commas.

ekawaler commented 6 years ago

Fixed