I get a couple of similar errors, the second one after debugging the first.
Error #1:
Traceback (most recent call last): File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 2228, in <module> get_variants(args.germline+"/merged_pytest/merged.vcf", results_folder+"/log/proteome.bed", "G") File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 396, in get_variants est.add_exon(chr, int(spoffsets[i])+start, int(spoffsets[i])+start+int(splengths[i])-1, total_exon_length, name) ValueError: invalid literal for int() with base 10: ''
I debug this by adding the following lines after line 379 in the first try statement within _"getvariants":
`lengths = lengths.rstrip(",")
offsets = offsets.rstrip(",")`
When I run the debugged code, I get error #2 :
Traceback (most recent call last): File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 2255, in <module> translate(results_folder+"/log/", "proteome.aa.var.bed.dna", logfile, 'aa') File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 980, in translate out_fasta.write(format_header(second_header, seq_type, abbr[gene], desc[gene])) File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 924, in format_header return format_aa_header(orig_header,abbr,desc) File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 859, in format_aa_header SNP = calculate_chr_pos(orig_header.split(':')[2]) File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 838, in calculate_chr_pos lengths = map(int,map_section.split()[1].split(',')) ValueError: invalid literal for int() with base 10: ''
This one I'm not able to debug.
I don't know if this could be a cause but I'm trying the uniprot version of the "prepare_proteome" section, which has the same problem every UCSC downloaded file seems to have - trailing commas.
I get a couple of similar errors, the second one after debugging the first.
Error #1:
Traceback (most recent call last): File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 2228, in <module> get_variants(args.germline+"/merged_pytest/merged.vcf", results_folder+"/log/proteome.bed", "G") File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 396, in get_variants est.add_exon(chr, int(spoffsets[i])+start, int(spoffsets[i])+start+int(splengths[i])-1, total_exon_length, name) ValueError: invalid literal for int() with base 10: ''
I debug this by adding the following lines after line 379 in the first try statement within _"getvariants": `lengths = lengths.rstrip(",")
offsets = offsets.rstrip(",")`
When I run the debugged code, I get error #2 :
Traceback (most recent call last): File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 2255, in <module> translate(results_folder+"/log/", "proteome.aa.var.bed.dna", logfile, 'aa') File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 980, in translate out_fasta.write(format_header(second_header, seq_type, abbr[gene], desc[gene])) File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 924, in format_header return format_aa_header(orig_header,abbr,desc) File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 859, in format_aa_header SNP = calculate_chr_pos(orig_header.split(':')[2]) File "/scratch/ahegde/SCCO_tumor_sequencing/cellLines/COV434/scripts/quilts.py", line 838, in calculate_chr_pos lengths = map(int,map_section.split()[1].split(',')) ValueError: invalid literal for int() with base 10: ''
This one I'm not able to debug. I don't know if this could be a cause but I'm trying the uniprot version of the "prepare_proteome" section, which has the same problem every UCSC downloaded file seems to have - trailing commas.