JiekaiLab / scTE

MIT License
97 stars 27 forks source link

Using scTE for T2T (list index out of range error) #97

Closed olivertam closed 2 months ago

olivertam commented 2 months ago

Hi,

Thank you for developing this software. I have been trying to build a T2T index for scTE using the scTE_build command, but have encountered the following error:

INFO    : Building the scTE genome annotation index... 2024-06-29 20:54:36
Traceback (most recent call last):
  File "/gpfs/miniconda3/envs/TEsingle_benchmarking/bin/scTE_build",
 line 4, in <module>
    __import__('pkg_resources').run_script('scTE==1.0', 'scTE_build')
  File "/gpfs/miniconda3/lib/python3.11/site-package
s/pkg_resources/__init__.py", line 720, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/gpfs/miniconda3/lib/python3.11/site-packages/pkg_resources/__init__.py", line 1559, in run_script
    exec(code, namespace, namespace)
  File "/gpfs/miniconda3/envs/TEsingle_benchmarking/lib/python3.12/site-packages/scTE-1.0-py3.12.egg/EGG-INFO/scripts/scTE_build", line 469, in <module>
    main()
  File "/gpfs/miniconda3/envs/TEsingle_benchmarking/lib/python3.12/site-packages/scTE-1.0-py3.12.egg/EGG-INFO/scripts/scTE_build", line 462, in main
    genomeIndex(args.genome,args.mode,tefile,genefile, args.out,'No path','No path')
  File "/gpfs/miniconda3/envs/TEsingle_benchmarking/lib/python3.12/site-packages/scTE-1.0-py3.12.egg/EGG-INFO/scripts/scTE_build", line 128, in genomeIndex
    gls.load_list(clean)
  File "/gpfs/miniconda3/envs/TEsingle_benchmarking/lib/python3.12/site-packages/scTE-1.0-py3.12.egg/scTE/miniglbase/genelist.py", line 1472, in load_list
    list_to_load[0]
    ~~~~~~~~~~~~^^^
IndexError: list index out of range

This is the STDOUT message:

Namespace(tefile=['T2T_CHM13_v2_rmsk_TE.bed'], genefile=['T2T_CHM13_v2_refseq_liftoffv5.1.gtf'], mode='exclusive', out='T2T', genome='other', info=<function info at 0x155553d822a0>)

The scTE version was pulled from Github on April 24, 2024.

As requested in #79, here are my command lines and first 10 lines of my GTF and BED:

$ scTE_build -te T2T_CHM13_v2_rmsk_TE.bed -gene T2T_CHM13_v2_refseq_liftoffv5.1.gtf --out T2T -m exclusive

$ head -n 10 T2T_CHM13_v2_refseq_liftoffv5.1.gtf 
chr1    Liftoff transcript      6047    13941   .       -       .       transcript_id "XR_002958507.2"; gene_id "LOC124900618"; gene_name "LOC124900618"
chr1    Liftoff exon    6047    6420    .       -       .       transcript_id "XR_002958507.2"; gene_id "LOC124900618"; gene_name "LOC124900618";
chr1    Liftoff exon    12078   12982   .       -       .       transcript_id "XR_002958507.2"; gene_id "LOC124900618"; gene_name "LOC124900618";
chr1    Liftoff exon    13445   13579   .       -       .       transcript_id "XR_002958507.2"; gene_id "LOC124900618"; gene_name "LOC124900618";
chr1    Liftoff exon    13680   13941   .       -       .       transcript_id "XR_002958507.2"; gene_id "LOC124900618"; gene_name "LOC124900618";
chr1    Liftoff transcript      15080   21429   .       +       .       transcript_id "XR_007068557.1_1"; gene_id "LOC124905335_1"; gene_name "LOC124905335"
chr1    Liftoff exon    15080   15564   .       +       .       transcript_id "XR_007068557.1_1"; gene_id "LOC124905335_1"; gene_name "LOC124905335";
chr1    Liftoff exon    20566   21429   .       +       .       transcript_id "XR_007068557.1_1"; gene_id "LOC124905335_1"; gene_name "LOC124905335";
chr1    Liftoff transcript      20529   37628   .       -       .       transcript_id "XM_047436352.1"; gene_id "LOC112268260"; gene_name "LOC112268260"
chr1    Liftoff exon    20529   21087   .       -       .       transcript_id "XM_047436352.1"; gene_id "LOC112268260"; gene_name "LOC112268260";

$ head -n 10 T2T_CHM13_v2_rmsk_TE.bed 
chr1    2709    4402    TAR1_dup8:TAR1:subtelo:Satellite        0       -
chr1    4082    4533    LTR60B_dup1:LTR60B:ERV1:LTR     0       -
chr1    4533    4660    LTR60B_dup2:LTR60B:ERV1:LTR     0       -
chr1    4663    5263    L1MC3_dup17:L1MC3:L1:LINE       0       +
chr1    5274    5528    MER34C_v_dup1:MER34C_v:ERV1:LTR 0       +
chr1    5528    5686    L1MC3_dup18:L1MC3:L1:LINE       0       +
chr1    5686    6131    MSTA1:MSTA1:ERVL-MaLR:LTR       0       +
chr1    6131    7132    L1MC3_dup19:L1MC3:L1:LINE       0       +
chr1    7141    7533    L1MC3_dup20:L1MC3:L1:LINE       0       +
chr1    8398    8840    MER4A1_dup1:MER4A1:ERV1:LTR     0       +

Any advice would be highly appreciated.

Thanks.

jphe commented 2 months ago

It seems the gtf problem, the scTE requires "protein_coding" or "lincRNA" tag for genes.

Can you remove the line 83-83 of scTE/bin/scTE_build and try again?

image

olivertam commented 2 months ago

Hi,

Thank you for your response. That appears to have done the trick. This might also resolve the issues raised in #79 and #95.

Thanks.