Open BenayounLaboratory opened 3 years ago
Can you check the 2015_Genome_scTE.bed
file is 6 columns bed format?
Hi,
yes. This is what the beginning of the file looks like:
NC_029649.1 0 910 NotFur1_Unknown_553 0 -
NC_029649.1 908 1184 NotFur1_rnd-1_family-367 0 +
NC_029649.1 1175 1836 NotFur1_rnd-1_family-330 0 +
NC_029649.1 1836 1878 NotFur1_Unknown_695 0 -
NC_029649.1 1925 2016 NotFur1_Unknown_695 0 -
NC_029649.1 2017 2142 NotFur1_Unknown_176 0 -
NC_029649.1 2137 2341 NotFur1_GapFilledScaffold_7542_21782 0 -
NC_029649.1 2396 2784 NotFur1_GapFilledScaffold_11762_4388 0 -
NC_029649.1 3549 3722 NotFur1_rnd-1_family-25 0 -
NC_029649.1 3650 3731 NotFur1_LTR_154 0 -
NC_029649.1 3808 3858 NotFur1_GapFilledScaffold_5446_39464 0 +
NC_029649.1 4004 4101 NotFur1_rnd-1_family-22 0 +
NC_029649.1 4018 4124 NotFur1_GapFilledScaffold_5294_191 0 +
I encountered the same error today, after checking the code in scTE_build, I found the cause of the error:
The readGtf() function in scTE_build script only expects regular chromosome name (1, 2, 3, etc) from .gtf files
chr_list = [ str(k) for k in list(range(1,50))] + ['X','Y', 'M'] #line14
readGtf() expects 'gene_biotype' information offered in the last column of .gtf files, which it can extract exons from protein_coding and lincRNA genes (useful in exclusive mode)
hope my experience be helpful
Thank you huangly2018! We have the biotype there, so that's probably not creating the issue. Developers: would there be a way to modify the code so as to be compatible with model organisms that have non-standard chromosomes names?
Yes, just as huangly2018 pointed out, scTE limited the chromosome names to avoid any unwanted bugs.
you can replace the line14 of scTE_buld chr_list = [ str(k) for k in list(range(1,50))] + ['X','Y', 'M']
with the list of your non-standard chromosome names chr_list =['NC_029649.1',...]
, then it should be OK.
Hi Jphe thanks for answering! Unfortunately, this is a draft genome with > 10,000 scaffolds, so the manual listing is just not practical. Would there be a way for the program to read the list of scaffolds from a file, or maybe deduce it from the gtf gene annotation file? If adding 1-2 lines of code can do that, could you please assist? I am not proficient with python. Thank you for your assistance - again, many people using nonstandard model organisms would be able to use the program if this is possible to do... Thank you!
Can you send the TE bed file and gene gtf file to me, then I can try and fix it in my PC
The 2 files are available there: https://www.dropbox.com/sh/50iccy9qp3a1hyp/AAC-TLmQ7buWeJitJRwHb2dCa?dl=0
I think this would help all people working on non-perfect genome assemblies - thank you so much!
Hey - sorry to bother you. Any updates on using a genome without the standard chromosome names? Thank you so much!
Can you try to replace the scTE/bin/scTE_build script with the attached file, it works in my PC.
You need to change the line16 with the absolute path of your bed/gtf file.
Thank you!
Sorry to bug you again - when I replace the file in my install as you suggested, I get this error:
Traceback (most recent call last):
File "/usr/local/bin/scTE_build", line 12, in <module>
from scTE.miniglbase import genelist, glload, location
Any idea why?
Thank you!
It's wired, may I make sure that you replace the old scTE_build.py file with new one, and then reinstall scTE and then get that error.
And I can not see the the full error information that you get.
Hi @jphe ,
Thanks for the excellent work. When I use the scTE_build
to build the index for Axolotl genome, the same error occurred with BenayounLaboratory who mentioned before. I followed the motified scTE_build.zip you uploaded and verified the gtf has the same chromsome names within the TE bed file with 6 coloumns.
The command and error are just as follow.
python3 scTE_build -te Axolotl.scTE.10000.bed -gene AmexT_v47-AmexG_v6.0-DD.gtf -o custom
INFO : genelist(): loaded 'Axolotl.scTE.bed' found 28728752 items INFO : genelist(): loaded 'AmexT_v47-AmexG_v6.0-DD.gtf' found 1741461 items Namespace(tefile=['Axolotl.scTE.bed'], genefile=['AmexT_v47-AmexG_v6.0-DD.gtf'], mode='exclusive', out='custom', genome='other', info=<function info at 0x2ad49cdc83a0>) INFO : Building the scTE genome annotation index... 2022-08-16 22:22:04 /public1/home/sc60481/software/scTE/scTE_build:114: DeprecationWarning: 'U' mode is deprecated o = open(tefile,'rU') Traceback (most recent call last): File "/public1/home/sc60481/software/scTE/scTE_build", line 472, in <module> main() File "/public1/home/sc60481/software/scTE/scTE_build", line 465, in main genomeIndex(args.genome,args.mode,tefile,genefile, args.out,'No path','No path') File "/public1/home/sc60481/software/scTE/scTE_build", line 131, in genomeIndex gls.load_list(clean) File "/public1/home/sc60481/software/scTE/scTE/miniglbase/genelist.py", line 1472, in load_list list_to_load[0] IndexError: list index out of range
I feel confused about how to solve this problem. Could you help me with this issue?
For your convenient, I upload the two files for you to test. The TE file is huge in axolotl, then I intercept 10000 lines for test.
https://cowtransfer.com/s/5f6927ec36fa46
The password is 0paaqv
.
Thanks for your time and work.
Hi - thank you for creating such a unique/useful tool! After testing the pipeline on one of the "out of the box" supported genomes (mm10) successfully, we are trying to use it to measure TE expression in single cells for the African turquoise killifish. We are using the NCBI genome annotation files (gtf) and ran repeatmasker to get the TE bed file. Everything looks ok based on the sample files provided here on Github. We have the most recent version of the package, and we use this command to build the index:
scTE_build -te 2015_Genome_scTE.bed -gene GCF_001465895.1_Nfu_20140520_genomic_CLEAN_MT_exon.filtered.gtf -o Nfu_20140520 -g other
Although the run starts ok, we get a warning almost immediately that does not stop the run:
However, after a few minutes, the program aborts with the following error:
Would you be able to assist in figuring out what the problem is? We are very excited to use the package on the African turquoise killifish data but won't be able to until we can have an index... Thank you so much for your help in advance!