Xinglab / espresso

Other
58 stars 4 forks source link

error with create_corrected_sam.py #72

Closed junior-2016 closed 3 months ago

junior-2016 commented 3 months ago

hi, eric. when I run create_corrected_sam.py script, an error occurs during the conversion of a string to an integer:

Traceback (most recent call last): File "/home/junior/espresso/snakemake/scripts/create_corrected_sam.py", line 1870, in main() File "/home/junior/espresso/snakemake/scripts/create_corrected_sam.py", line 1856, in main orig_read_final_paths, abs_out_dir, args.sort_memory_buffer_size) File "/home/junior/espresso/snakemake/scripts/create_corrected_sam.py", line 563, in parse_and_sort_read_final_paths out_handle) File "/home/junior/espresso/snakemake/scripts/create_corrected_sam.py", line 522, in write_read_final_entries_for_sorting next_line) File "/home/junior/espresso/snakemake/scripts/create_corrected_sam.py", line 133, in parse_read_final_read_id details_added = process_read_final_line(read_details, line, in_path) File "/home/junior/espresso/snakemake/scripts/create_corrected_sam.py", line 168, in process_read_final_line process_read_final_sj_feature(read_details, columns, in_path) File "/home/junior/espresso/snakemake/scripts/create_corrected_sam.py", line 228, in process_read_final_sj_feature columns[8]) File "/home/junior/espresso/snakemake/scripts/create_corrected_sam.py", line 85, in parse_na_int_str return int(string) ValueError: invalid literal for int() with base 10: '639.5'

I reviewed the corresponding records in the file, and they are as follows:

0f318797-64b5-4330-916e-f51c1afe079b group_ID 19443 19443_41 0f318797-64b5-4330-916e-f51c1afe079b strand_isoform 1 0f318797-64b5-4330-916e-f51c1afe079b strand_read 1 0f318797-64b5-4330-916e-f51c1afe079b chr chr9 0f318797-64b5-4330-916e-f51c1afe079b mapq 49 0f318797-64b5-4330-916e-f51c1afe079b flag 16 0f318797-64b5-4330-916e-f51c1afe079b SJ 1 chr9:14940:15080:1 422 932 pass chr9:14940:15080:1 422 932 0.0904396557560598 no no 0f318797-64b5-4330-916e-f51c1afe079b SJ 2 chr9:15149:15908:1 491 863 pass chr9:15149:15908:1 491 863 0.193947767516279 no no 0f318797-64b5-4330-916e-f51c1afe079b SJ 3 chr9:16056:16717:1 638 716 corrected chr9:16061:16717:1 639.5 714.5 7.21455013503734e-06 no no 0f318797-64b5-4330-916e-f51c1afe079b SJ 4 chr9:16876:16968:1 793 561 pass chr9:16876:16968:1 793 561 0.0200405138437298 no no 0f318797-64b5-4330-916e-f51c1afe079b SJ 5 chr9:17166:17343:1 980 374 pass chr9:17166:17343:1 980 374 0.193947767516279 no no 0f318797-64b5-4330-916e-f51c1afe079b start NA 14524 5 1349 pass 14524 5 1349 NA NA NA 0f318797-64b5-4330-916e-f51c1afe079b end NA 17729 1353 1 pass 17729 1353 1 NA NA NA 0f318797-64b5-4330-916e-f51c1afe079b mapped_length_read 1348

One of the columns = ['0f318797-64b5-4330-916e-f51c1afe079b', 'SJ', '3', 'chr9:16056:16717:1', '638', '716', 'corrected', 'chr9:16061:16717:1', '639.5', '714.5', '7.21455013503734e-06', 'no', 'no'] contains the value '639.5', which should have been an integer. Is this case expected behavior?

EricKutschera commented 3 months ago

Was the ESPRESSO output that you ran create_corrected_sam.py on created from ESPRESSO v1.5.0? In v1.4.0 it was expected to see a value like 639.5 in that part of a read_final file. In v1.5.0 it should always be an integer. In v1.4.0 the value wasn't really being used for anything, but in v1.5.0 the way that value was calculated was updated so that create_corrected_sam.py could use it

This is where the code creates that line of the output: https://github.com/Xinglab/espresso/blob/v1.5.0/src/ESPRESSO_C.pl#L1332 The 639.5 would be from $read_pos_matched_to_sj which should always be an integer because it's based on $query_sj_pos which is an alignment position: https://github.com/Xinglab/espresso/blob/v1.5.0/src/ESPRESSO_C.pl#L1874

In v1.4.0 that value came from $est_pos_read which was a midpoint of two coordinates: https://github.com/Xinglab/espresso/blob/v1.4.0/src/ESPRESSO_C.pl#L1608

junior-2016 commented 3 months ago

Thank you. The output was obtained using Espresso v1.4. I'll try again using the new version.