GenomeRIK / tama

Transcriptome Annotation by Modular Algorithms (for long read RNA sequencing data)
GNU General Public License v3.0
125 stars 24 forks source link

tama_merge IndexError: list index out of range #94

Closed olechnwin closed 1 year ago

olechnwin commented 1 year ago

Hi,

I am actually running tama_merge as part of nfcore_isoseq. But I thought it'll be appropriate to get help here. These are the command executed and the error:

  tama_merge.py \
      -f ASP14_T1.tsv \
      -d merge_dup \
      -p ASP14_T1 \
      -a 100 -m 10 -z 100

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_ISOSEQ:ISOSEQ:GSTAMA_MERGE":
      gstama: $( tama_merge.py -version | head -n1 )
  END_VERSIONS

Command exit status:
  1

Command output:
  0.0.1
  Default collapse exon ends flag will be used: common_ends
  Default source ID merge flag: no_source_id
  Default CDS merge flag: no_cds
  opening file list
  opening bed list

Command error:
  WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_TMPDIR as environment variable will not be supported in the futu
re, use APPTAINERENV_TMPDIR instead
  WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_NXF_DEBUG as environment variable will not be supported in the f
uture, use APPTAINERENV_NXF_DEBUG instead
  Traceback (most recent call last):
    File "/usr/local/bin/tama_merge.py", line 3587, in <module>
      trans_start = int(line_split[1])
  IndexError: list index out of range

Do you have any suggestion on how to fix this?

Thank you in advance for your help!

GenomeRIK commented 1 year ago

Hello,

Thank you for using TAMA!

Can you check your input BED12 files to make sure all lines have 12 tab separated fields?

Thank you, Richard

olechnwin commented 1 year ago

Hi Richard,

Thank you for your quick reply. I always appreciate this.

I assumed you meant the ASP14_T1.chunk*_collapsed.bed which were listed in ASP14_T1.tsv. Some of the files are empty.

8.0K    ./c8/b9b04c9b763e21e36042cf521cc4a3/ASP14_T1.chunk1_collapsed.bed
8.0K    ./c8/b9b04c9b763e21e36042cf521cc4a3/ASP14_T1.chunk5_collapsed.bed
0   ./aa/732a502d82dae76241094efee01aff/ASP14_T1.chunk27_collapsed.bed
0   ./4c/b52123d7a88ce27e6a11c8bed36e82/ASP14_T1.chunk33_collapsed.bed
640K    ./1f/d229f8d29b04988511dfb02a730113/ASP14_T1.chunk39_collapsed.bed
608K    ./e2/882cbbaeef63f7be3e77ad14307624/ASP14_T1.chunk21_collapsed.bed

So I went back to look at the tama_collapse outputs. These are the commands that was run that generated empty bed file:

#!/bin/bash -euo pipefail
tama_collapse.py \
    -s ASP14_T1.chunk33.bam \
    -f a673_hap1_0.fasta \
    -p ASP14_T1.chunk33 \
    -x no_cap -b BAM -a 100 -m 10 -z 100

cat <<-END_VERSIONS > versions.yml
"NFCORE_ISOSEQ:ISOSEQ:GSTAMA_COLLAPSE":
    gstama: $( tama_collapse.py -version | grep 'tc_version_date_'|sed 's/tc_version_date_//g' )
END_VERSIONS

and these are the output in the tama_collapse folder:

-rw-r--r--  1 cx050 gdlab    0 Jan 24 14:55 ASP14_T1.chunk33_collapsed.bed
-rw-r--r--  1 cx050 gdlab 4.2M Jan 24 14:58 ASP14_T1.chunk33_local_density_error.txt
-rw-r--r--  1 cx050 gdlab   54 Jan 24 14:58 ASP14_T1.chunk33_polya.txt
-rw-r--r--  1 cx050 gdlab 1.8M Jan 24 14:58 ASP14_T1.chunk33_read.txt
-rw-r--r--  1 cx050 gdlab   43 Jan 24 14:58 ASP14_T1.chunk33_strand_check.txt
-rw-r--r--  1 cx050 gdlab    0 Jan 24 14:55 ASP14_T1.chunk33_trans_read.bed
-rw-r--r--  1 cx050 gdlab  190 Jan 24 14:58 ASP14_T1.chunk33_trans_report.txt
-rw-r--r--  1 cx050 gdlab   27 Jan 24 14:58 ASP14_T1.chunk33_varcov.txt
-rw-r--r--  1 cx050 gdlab   74 Jan 24 14:58 ASP14_T1.chunk33_variants.txt
-rw-r--r--  1 cx050 gdlab    0 Jan 24 14:55 .command.begin
-rw-r--r--  1 cx050 gdlab  312 Jan 24 14:55 .command.err
-rw-r--r--  1 cx050 gdlab 3.3K Jan 24 14:58 .command.log
-rw-r--r--  1 cx050 gdlab 2.0K Jan 24 14:58 .command.out
-rw-r--r--  1 cx050 gdlab  11K Jan 24 14:45 .command.run
-rw-r--r--  1 cx050 gdlab  355 Jan 24 14:45 .command.sh
-rw-r--r--  1 cx050 gdlab  232 Jan 24 14:58 .command.trace
-rw-r--r--  1 cx050 gdlab    1 Jan 24 14:58 .exitcode
-rw-r--r--  1 cx050 gdlab   63 Jan 24 14:58 versions.yml

Thank you! Cen

olechnwin commented 1 year ago

Hi Richard,

I just noticed in folders where bed file are empy, they have "Genome seq is not the same length as query seq" and did not have the "TAMA Collapse has successfully finished running!" message that was in the one that have non-empty bed file.

Here is one of the log from running tama_collapse that generated empty bed:

tc_version_date_2021_11_03
Default collapse exon ends flag will be used: common_ends
Default coverage: 99
Default identity: 85
Default identity calculation method: ident_cov
Default duplicate merge flag: merge_dup
Default splice junction priority: no_priority
Default splice junction error threshold: 10
Default splice junction local density error threshold: 1000
Default simple error symbol for matches is the underscore "_" .
Using BAM format for reading in.
Default log output on
Default run mode original
Default 5 read threshold
time taken since last check:    0:0:0
time taken since beginning:     0:0:0
going through fasta
time taken since last check:    0:0:48
time taken since beginning:     0:0:48
going through sam file
4784
Genome seq is not the same length as query seq
[]
['A', 'A', 'T', 'C', 'T']
329102
0

Thanks! Cen

GenomeRIK commented 1 year ago

Hi Cen,

What alignment tool did you use? This error typically happens from bugs in the alignment tool that result in mapping off of the genome.

Thank you, Richard

olechnwin commented 1 year ago

I was using uLTRA.

GenomeRIK commented 1 year ago

Ok do you mind taking a look at this thread to see if it helps?

https://github.com/GenomeRIK/tama/issues/80

olechnwin commented 1 year ago

Thank you so much! I'll try and see if that helps.

ksahlin commented 1 year ago

Yes, please check if you are using uLTRA v0.0.4.2 where the bug in #80 was fixed.

olechnwin commented 1 year ago

@ksahlin, Thank you! that's very helpful. I can see that I'm using uLTRA v0.0.4.1. I'll use the v0.0.4.2

GenomeRIK commented 1 year ago

Thanks for chiming in Kristoffer!

Cen, I am going to close this thread now but please feel free to re-open if you are still having issues.

olechnwin commented 1 year ago

Thank you again. Appreciate your help.