Closed r-mashoodh closed 2 years ago
Dear Rahia Mashoodh, first of all thank you very much for your interest in TransposonUltimate.
This is the page of transposon_annotation_tools. These are just the conda packages for all transposon annotation tools, no parsing is part of this package. The aim of this page is just to facilitate execution of different annotation software, but not the interpretation of the many various different output files.
As you might have noticed, transposon_annotation_tools is part of TransposonUltimate. TransposonUltimate contains additional software packages, for example reasonaTE.
This reasonaTE package can call the different annotation programs and also parse their output to the standardized annotation format GFF3. GFF3 is todays standard file format for storing annotation information.
Besides, reasonaTE offers other functionality, as you will find reading the page about it. For your purpose, using it to call different annotation software and to parse it should be sufficient. To do so, after installation, follow steps 1 - 3. You will find the GFF3 files in the folder "parsedAnnotations" of your reasonaTE project.
I would be very happy to hear back from you if you have any further questions. Also, please let me know if you were successfull with using the software and if you have any further suggestions.
Best regards, Kevin Riehl
Dear Kevin,
Thanks so much for this explanation. It took me awhile to realise there is a 2nd environment! I think that was the confusing part -- knowing exactly where to start in terms of the pipeline, and in what order. It might be useful to explain this in the main TransposonUltimate page?
The annotation step is currently running on the Cambridge HPC himem nodes. Fingers crossed!!!
Best, rahia
Dear Kevin,
I am running a bunch of the annotate tools in parallel.
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool helitronScanner &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool ltrHarvest &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool must &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool repeatmodel &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool repMasker &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool sinefind &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool sinescan &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool tirvish &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool transposonPSI &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool NCBICDD1000
I ran into some issues:
workspace
? sh: RepeatMasker: command not found
sh: BuildDatabase: command not found
sh: RepeatModeler: command not found
Annotation by software repeatmodel finished successfully...
Annotation by software repMasker finished successfully...
I would run this using the docker container, are there any other params you reccomend?
singularity exec docker://dfam/tetools:latest BuildDatabase -name sequence_index -engine ncbi sequence.fasta
singularity exec docker://dfam/tetools:latest RepeatModeler -database sequence_index -pa 32 -LTRStruct > run.out
singularity exec docker://dfam/tetools:latest RepeatMasker -pa 32 -a -s -gff -no_is -lib metazoa sequence.fasta
perl /rds/user/rm786/hpc-work/miniconda3/envs/transposon_annotation_tools_env/bin/SINE_Scan-v1.1.1/SINE_Scan_process.pl -s 123 -g /rds/user/rm786/hpc-work/workspace/nVes/sequence.fasta -o /rds/user/rm786/hpc-work/workspace/nVes/sinescan/output -d /rds/user/rm786/hpc-work/workspace/nVes/sinescan/result -z /rds/user/rm786/hpc-work/workspace/nVes/sinescan/final
perl: symbol lookup error: /home/rm786/perl5/lib/perl5/x86_64-linux-thread-multi/auto/List/Util/Util.so: undefined symbol: Perl_xs_apiversion_bootcheck
Annotation by software sinescan finished successfully…
and
processing seq1.
blast against /rds/user/rm786/hpc-work/workspace/nVes/transposonPSI/temp/transposonPSIcli/transposon_PSI_LIB/cacta.refSeq
CMD: blastall -i /rds/user/rm786/hpc-work/workspace/nVes/transposonPSI/temp/transposonPSIcli/transposon_PSI_LIB/cacta.refSeq -d transposonPSI.107765.cpu-p-491.tmp/seq1/seq1.seq -p psitblastn -R /rds/user/rm786/hpc-work/workspace/nVes/transposonPSI/temp/transposonPSIcli/transposon_PSI_LIB/cacta.chk -F F -M BLOSUM62 -t -1 -e 1e-5 -v 10000 -b 10000 > transposonPSI.107765.cpu-p-491.tmp/seq1/seq1.cacta.refSeq.psitblastn
perl: symbol lookup error: /home/rm786/perl5/lib/perl5/x86_64-linux-thread-multi/auto/List/Util/Util.so: undefined symbol: Perl_xs_apiversion_bootcheck
Error /rds/user/rm786/hpc-work/workspace/nVes/transposonPSI/temp/transposonPSIcli/scripts/BPbtab < transposonPSI.107765.cpu-p-491.tmp/seq1/seq1.cacta.refSeq.psitblastn > transposonPSI.107765.cpu-p-491.tmp/seq1/seq1.cacta.refSeq.psitblastn.btab 32512 at ./transposonPSI.pl line 147, <$filehandle> line 1.
/rds/user/rm786/hpc-work/miniconda3/envs/transposon_annotation_tools_env/bin
/rds/user/rm786/hpc-work/miniconda3/envs/transposon_annotation_tools_env
/rds/user/rm786/hpc-work/miniconda3/envs/transposon_annotation_tools_env/share/transposonPSIcli
finished completely...
Annotation by software transposonPSI finished successfully...
I thought of adding List:Util
to both the annotation_tools
and resonaTE
? But I know its easy to mess up conda envs so wanted to check first ... do you have any advice?
I would appreciate any help.
thanks so much in advance.
Dear Rahia Mashoodh, thanks for your answer and report, and excuse my delayed reply.
"I am running a bunch of the annotate tools in parallel." That looks good :-)
"RepeatModeler/RepeatMasker but I think I can just run this separately with the dfam-tetools container and then move the files over to the workspace? [...] I would run this using the docker container, are there any other params you reccomend?" I am not sure about that. RepeatModeler and RepeatMasker are massive tools, and unfortunately their conda packages are reported not to work properly everywhere. The only thing I can do for you is to refer to the page of the software or to do some further online research on this one.
_"Some tools contain a perl error [...] I thought of adding List:Util to both the annotationtools and resonaTE? But I know its easy to mess up conda envs so wanted to check first ... do you have any advice?" Oh that error doesnt look to well. Unfortunately, as you might know, we are not the authors of these softwares and just migrated and packaged them into Conda. The idea of conda is that it can be installed platform-independently, however, even conda seems to make trouble on different systems. I had lots of colleages and persons reporting me that the current conda package worked fine after installation, so I guess adding your suggestion to the repo is not a good idea. However, if you find a way to solve this issue, please report it here, so that future users with similar issues can leverage from your experience.
Sorry that I could not help further, still I hope this was somehow helpful to you. Best regards, Kevin Riehl
parse
@mhemberg
Hai did you get restults by running these comments
"singularity exec docker://dfam/tetools:latest BuildDatabase -name sequence_index -engine ncbi sequence.fasta singularity exec docker://dfam/tetools:latest RepeatModeler -database sequence_index -pa 32 -LTRStruct > run.out singularity exec docker://dfam/tetools:latest RepeatMasker -pa 32 -a -s -gff -no_is -lib metazoa sequence.fasta"
I am also using the same environment. please send me your suggestions.
with regards
Ramky
parse
@mhemberg
Hai did you get restults by running these comments
"singularity exec docker://dfam/tetools:latest BuildDatabase -name sequence_index -engine ncbi sequence.fasta singularity exec docker://dfam/tetools:latest RepeatModeler -database sequence_index -pa 32 -LTRStruct > run.out singularity exec docker://dfam/tetools:latest RepeatMasker -pa 32 -a -s -gff -no_is -lib metazoa sequence.fasta"
I am also using the same environment. please send me your suggestions.
with regards
Ramky
Dear Ramky, could you please open a new issue related to your problem and not append to other peoples problem? Thank you
Btw mhemberg mentioned he doesnt know you, is there a specific reason you ask him for this question?
Best, Kevin
Hello,
Thanks for creating this tool! I'd really like to use it. However, I'm a bit confused about how to parse all the output files from the many repeat identification tools.
I would appreciate any advice!
Best wishes.