clemgoub / dnaPipeTE

dnaPipeTE (for de-novo assembly & annotation Pipeline for Transposable Elements), is a pipeline designed to find, annotate and quantify Transposable Elements in small samples of NGS datasets. It is very useful to quantify the proportion of TEs in newly sequenced genomes since it does not require genome assembly and works on small datasets (< 1X).
48 stars 11 forks source link

Error: Issue with producing output graphs #63

Closed DR-genomics closed 2 years ago

DR-genomics commented 2 years ago

Hello @clemgoub,

I ran dnaPipeTE for my samples. At the end of each run, I have fasta sequences in Trinity.fasta, along with a summary of repeat families listed in ".tbl" output files. However, none of the runs has output graphs. At the end of each run, I receive errors as below:

"/bin/sh: /gpfs20/mypath/dnaPipeTE/bin/parallel: No such file or directory /bin/sh: /gpfs20/mypath/dnaPipeTE/bin/parallel: No such file or directory /bin/sh: /gpfs20/mypath/dnaPipeTE/bin/parallel: No such file or directory Error in read.table(paste(folder, file1, sep = "/")) : no lines available in input Execution halted"

I don't have a parallel directory listed within the bin folder. Is that something comes along with the software? Any help will be much appreciated!

clemgoub commented 2 years ago

Dear @DR-genomics,

I am currently in the process of updating the dnaPipeTE repository, in particular with instructions to use the docker/singularity version of dnaPipeTE 1.3 which will solve the dependency problem you describe.

I suggest that you follow the instructions here: https://hub.docker.com/r/clemgoub/dnapipete to run the containerized version of the program. You can do so either with Docker (root privileges) or Singularity (non-root user). I will be happy to assist you if you have trouble installing it.

Regarding the graphs, this should also resolve the issue. In addition, I just created a toolkit with several scripts to process dnaPipeTE outputs and re-create the original graphs in a more customizable fashion. The repos was just published yesterday and the documentation is there, however please let me know if you encounter any issue.

Best,

Clément

DR-genomics commented 2 years ago

Thanks for the prompt response Clément! Good to know about the software update! I would like to try the singularity version of the software, as docker is not available in the cluster which I am using. Can you provide a link for the same?

And, along with the output graphs, reads_per_component_and_annotation file is missing as well. Are the graphs and reads_per_component_and_annotation files are linked to each other?

Thanks

clemgoub commented 2 years ago

Hi!

The instruction to use dnaPipeTE with Singularity are as follow:

1- First create a Singularity image from the Docker container

mkdir ~/dnaPipeTE
cd ~/dnaPipeTE
singularity pull --name dnapipete.img docker://clemgoub/dnapipete:latest

This step requires approximately 20 minutes to complete. However, it is only required once for installation.

2- Assuming you have a project folder with your data in ~/data. We create a file that will contain the commands for the run. For example:

cd ~/data
touch dnaPipeTE_cmd.sh

With the text editor of your choice, edit dnaPipeTE_cmd.sh with the commands for dnaPipeTE. For example:

cd /opt/dnaPipeTE
python3 dnaPipeTE.py -input /mnt/SRR14470610.mt.clean.R1.fastq -output /mnt/dnaPipeTE_0.15_1_t20 -genome_size 180000000 -genome_coverage 0.15 -sample_number 2 -RM_lib ../RepeatMasker/Libraries/RepeatMasker.lib -RM_t 0.2 -cpu 8
  • The first line is required to execute the scripts in the right directory of the container
  • The second line is a standard dnaPipeTE command
  • /mnt is the default directory in the Singularity container where a user directory can be mounted to access and write data outside the container. In this example, /mnt in the container will points towards ~/data in your machine. It will be specified in the next command, that actually starts the container, mount the user data and run the program.

3- Start a run

singularity exec --bind ~/data:/mnt ~/dnaPipeTE/dnapipete.img bash /mnt/dnaPipeTE_cmd.sh

Note that --bind is the command that indicates where the data are located outside the container. In this example ~/data. This directory will also be where the output folder dnaPipeTE_0.15_1_t20 will be created

DR-genomics commented 2 years ago

Thank you for the detailed instructions! I could run dnaPipeTE via singularity without issues. Except, it didn't produce landscapes.pdf on its own, however I used one of your utility script (dnaPT_landscapes.sh) to get the same. Currently, running dnaPipeTE with the species specific repeat database.

Thank you!

clemgoub commented 2 years ago

Excellent! Thank you for letting me know and please don't hesitate if you need further help!

Cheers,

Clément