DerKevinRiehl / transposon_annotation_tools

A set of bioconda packages for transposon annotations. During my masterthesis I downloaded lots of these tools and I want to make it easier for people to install and run these softwares.
GNU General Public License v3.0
10 stars 7 forks source link

how to parse all the outputs? #3

Closed r-mashoodh closed 2 years ago

r-mashoodh commented 2 years ago

Hello,

Thanks for creating this tool! I'd really like to use it. However, I'm a bit confused about how to parse all the output files from the many repeat identification tools.

I would appreciate any advice!

Best wishes.

DerKevinRiehl commented 2 years ago

Dear Rahia Mashoodh, first of all thank you very much for your interest in TransposonUltimate.

This is the page of transposon_annotation_tools. These are just the conda packages for all transposon annotation tools, no parsing is part of this package. The aim of this page is just to facilitate execution of different annotation software, but not the interpretation of the many various different output files.

As you might have noticed, transposon_annotation_tools is part of TransposonUltimate. TransposonUltimate contains additional software packages, for example reasonaTE.

This reasonaTE package can call the different annotation programs and also parse their output to the standardized annotation format GFF3. GFF3 is todays standard file format for storing annotation information.

Besides, reasonaTE offers other functionality, as you will find reading the page about it. For your purpose, using it to call different annotation software and to parse it should be sufficient. To do so, after installation, follow steps 1 - 3. You will find the GFF3 files in the folder "parsedAnnotations" of your reasonaTE project.

I would be very happy to hear back from you if you have any further questions. Also, please let me know if you were successfull with using the software and if you have any further suggestions.

Best regards, Kevin Riehl

r-mashoodh commented 2 years ago

Dear Kevin,

Thanks so much for this explanation. It took me awhile to realise there is a 2nd environment! I think that was the confusing part -- knowing exactly where to start in terms of the pipeline, and in what order. It might be useful to explain this in the main TransposonUltimate page?

The annotation step is currently running on the Cambridge HPC himem nodes. Fingers crossed!!!

Best, rahia

r-mashoodh commented 2 years ago

Dear Kevin,

I am running a bunch of the annotate tools in parallel.

reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool helitronScanner &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool ltrHarvest &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool must &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool repeatmodel &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool repMasker &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool sinefind &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool sinescan &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool tirvish &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool transposonPSI &
reasonaTE -mode annotate -projectFolder workspace -projectName nVes -tool NCBICDD1000

I ran into some issues:

  1. RepeatModeler/RepeatMasker but I think I can just run this separately with the dfam-tetools container and then move the files over to the workspace?
sh: RepeatMasker: command not found
sh: BuildDatabase: command not found
sh: RepeatModeler: command not found
Annotation by software  repeatmodel  finished successfully...
Annotation by software  repMasker  finished successfully...

I would run this using the docker container, are there any other params you reccomend?

singularity exec docker://dfam/tetools:latest BuildDatabase -name sequence_index -engine ncbi sequence.fasta
singularity exec docker://dfam/tetools:latest RepeatModeler -database sequence_index -pa 32 -LTRStruct > run.out
singularity exec docker://dfam/tetools:latest RepeatMasker -pa 32 -a -s -gff -no_is -lib metazoa sequence.fasta
  1. Some tools contain a perl error:
perl /rds/user/rm786/hpc-work/miniconda3/envs/transposon_annotation_tools_env/bin/SINE_Scan-v1.1.1/SINE_Scan_process.pl  -s 123 -g /rds/user/rm786/hpc-work/workspace/nVes/sequence.fasta -o /rds/user/rm786/hpc-work/workspace/nVes/sinescan/output -d /rds/user/rm786/hpc-work/workspace/nVes/sinescan/result -z /rds/user/rm786/hpc-work/workspace/nVes/sinescan/final
perl: symbol lookup error: /home/rm786/perl5/lib/perl5/x86_64-linux-thread-multi/auto/List/Util/Util.so: undefined symbol: Perl_xs_apiversion_bootcheck
Annotation by software  sinescan  finished successfully…

and

processing seq1.
        blast against /rds/user/rm786/hpc-work/workspace/nVes/transposonPSI/temp/transposonPSIcli/transposon_PSI_LIB/cacta.refSeq
CMD: blastall -i /rds/user/rm786/hpc-work/workspace/nVes/transposonPSI/temp/transposonPSIcli/transposon_PSI_LIB/cacta.refSeq -d transposonPSI.107765.cpu-p-491.tmp/seq1/seq1.seq -p psitblastn -R /rds/user/rm786/hpc-work/workspace/nVes/transposonPSI/temp/transposonPSIcli/transposon_PSI_LIB/cacta.chk -F F -M BLOSUM62 -t -1 -e 1e-5 -v 10000 -b 10000 > transposonPSI.107765.cpu-p-491.tmp/seq1/seq1.cacta.refSeq.psitblastn
perl: symbol lookup error: /home/rm786/perl5/lib/perl5/x86_64-linux-thread-multi/auto/List/Util/Util.so: undefined symbol: Perl_xs_apiversion_bootcheck
Error /rds/user/rm786/hpc-work/workspace/nVes/transposonPSI/temp/transposonPSIcli/scripts/BPbtab < transposonPSI.107765.cpu-p-491.tmp/seq1/seq1.cacta.refSeq.psitblastn > transposonPSI.107765.cpu-p-491.tmp/seq1/seq1.cacta.refSeq.psitblastn.btab 32512 at ./transposonPSI.pl line 147, <$filehandle> line 1.
/rds/user/rm786/hpc-work/miniconda3/envs/transposon_annotation_tools_env/bin
/rds/user/rm786/hpc-work/miniconda3/envs/transposon_annotation_tools_env
/rds/user/rm786/hpc-work/miniconda3/envs/transposon_annotation_tools_env/share/transposonPSIcli
finished completely...
Annotation by software  transposonPSI  finished successfully...

I thought of adding List:Util to both the annotation_tools and resonaTE? But I know its easy to mess up conda envs so wanted to check first ... do you have any advice?

I would appreciate any help.

thanks so much in advance.

DerKevinRiehl commented 2 years ago

Dear Rahia Mashoodh, thanks for your answer and report, and excuse my delayed reply.

"I am running a bunch of the annotate tools in parallel." That looks good :-)

"RepeatModeler/RepeatMasker but I think I can just run this separately with the dfam-tetools container and then move the files over to the workspace? [...] I would run this using the docker container, are there any other params you reccomend?" I am not sure about that. RepeatModeler and RepeatMasker are massive tools, and unfortunately their conda packages are reported not to work properly everywhere. The only thing I can do for you is to refer to the page of the software or to do some further online research on this one.

_"Some tools contain a perl error [...] I thought of adding List:Util to both the annotationtools and resonaTE? But I know its easy to mess up conda envs so wanted to check first ... do you have any advice?" Oh that error doesnt look to well. Unfortunately, as you might know, we are not the authors of these softwares and just migrated and packaged them into Conda. The idea of conda is that it can be installed platform-independently, however, even conda seems to make trouble on different systems. I had lots of colleages and persons reporting me that the current conda package worked fine after installation, so I guess adding your suggestion to the repo is not a good idea. However, if you find a way to solve this issue, please report it here, so that future users with similar issues can leverage from your experience.

Sorry that I could not help further, still I hope this was somehow helpful to you. Best regards, Kevin Riehl

Ramkyeri commented 1 year ago

parse

@mhemberg

Hai did you get restults by running these comments

"singularity exec docker://dfam/tetools:latest BuildDatabase -name sequence_index -engine ncbi sequence.fasta singularity exec docker://dfam/tetools:latest RepeatModeler -database sequence_index -pa 32 -LTRStruct > run.out singularity exec docker://dfam/tetools:latest RepeatMasker -pa 32 -a -s -gff -no_is -lib metazoa sequence.fasta"

I am also using the same environment. please send me your suggestions.

with regards

Ramky

DerKevinRiehl commented 1 year ago

parse

@mhemberg

Hai did you get restults by running these comments

"singularity exec docker://dfam/tetools:latest BuildDatabase -name sequence_index -engine ncbi sequence.fasta singularity exec docker://dfam/tetools:latest RepeatModeler -database sequence_index -pa 32 -LTRStruct > run.out singularity exec docker://dfam/tetools:latest RepeatMasker -pa 32 -a -s -gff -no_is -lib metazoa sequence.fasta"

I am also using the same environment. please send me your suggestions.

with regards

Ramky

Dear Ramky, could you please open a new issue related to your problem and not append to other peoples problem? Thank you

Btw mhemberg mentioned he doesnt know you, is there a specific reason you ask him for this question?

Best, Kevin