JiekaiLab / scTE

MIT License
87 stars 27 forks source link

Difference between two version of hg38.exclusive.idx #64

Open XiaoyuZhan520 opened 12 months ago

XiaoyuZhan520 commented 12 months ago

Hello!

Thanks for your effort in scTE.

I am building index for hg38 in different ways recommended by the tutorial. However, the outputs are not identical. I am not sure what the differences are.

#Method one:
scTE_build -g hg38

This produced a 692M index named hg38.exclusive.idx

#Method two:
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_30/gencode.v30.annotation.gtf.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/rmsk.txt.gz

gunzip -c gencode.v30.annotation.gtf.gz > gencode.v30.annotation.gtf
gunzip -c rmsk.txt.gz > rmsk.txt

scTE_build -te rmsk.txt -gene gencode.v30.annotation.gtf -o hg38

This produced a 686M index named hg38.exclusive.idx. Besides, there has been some warming information listed below.

/home/bin/lib/python3.10/site-packages/scTE-1.0-py3.10.egg/EGG-INFO/scripts/scTE_build:162: DeprecationWarning: 'U' mode is deprecated
  o = open(tefilename,'rU')

I understand -te parameter should accept a bed file as input, but the code of scTE_build shows that rmsk.txt also work since it would take the information of column 6-8 and 11 automatically from non-TE bed.

I would be grateful if you could provide any idea about the differences. Many thanks in advance!

jphe commented 11 months ago

the -te option requires BED file, which must have 6 columns.

the rmsk.txt cannot be used directly with the -te parameter as the format problem