Open AnneJRomero opened 1 year ago
Good afternoon,
You can find an example folder with toy files to check if everything is properly installed.
Thank you!
Hi,
Thank you for your reply.
I get the following output, can you please help me on what is wrong?
"my" variable @ref masks earlier declaration in same scope at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 2088. "my" variable $data masks earlier declaration in same scope at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 2089. "my" variable $wd masks earlier declaration in same scope at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 2102.
* T-lex release 3
Report the presence/absence of given sequence(s) in strain(s) *
and return their frequency
* Wed Sep 14 20:08:07 2022 *
Output directory: /mainfs/scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/RepeatMasker/tlex_exampledata/
******************** Prepare the input data ********************
Simplify the fasta file of the reference sequences ...
******************** Tjunction analysis ********************
Identification of TE insertions nested or flanked by repeats.... RepeatMasker version 4.1.3 Search Engine: NCBI/RMBLAST [ 2.11.0+ ]
Using Master RepeatMasker Database: /mainfs/scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/RepeatMasker/Libraries/RepeatMaskerLib.h5 Title : Dfam Version : 3.6 Date : 2022-04-12 Families : 19,025
Species "drosophila" is not known to RepeatMasker. There may not be any TE families defined in the libraries for this species/clade or there may be an error in the spelling. Please check your entry against the NCBI Taxonomy database and/or try using a broader clade or related species instead. The full list of species/clades defined in the library may be obtained using the famdb.py script.
mv: cannot stat 'tlex_exampledata/Tflank_checking_125.fasta.out': No such file or directory mv: cannot stat 'tlex_exampledata/Tflank_checking_125.fasta.masked': No such file or directory
Identification of TE insertions misannotated because of a longer Poly A/T tail....
Identification of TE insertions part of segmental duplications.... /mainfs/scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/RepeatMasker blat tlex_exampledata/Tflank_checking_125.fasta tlex_exampledata/Tgenome.fasta tlex_exampledata/Tflank_checking_125.fasta.blast9 -out=blast9 Loaded 1260 letters in 10 sequences Query sequence 2L has size 23513712, it might take a while. Query sequence 2R has size 25286936, it might take a while. Query sequence 3L has size 28110227, it might take a while. Query sequence 3R has size 32079331, it might take a while. Query sequence X has size 23542271, it might take a while. Searched 137547960 bases in 7 sequences
Hello Anne,
Could you do a screenshot of the Tanalysis folder to see what's inside? I understand the program stopped and you get no results, is it?
Regards.
Hi,
Here's the results folder: [ar14g12@cyan52 tlex_exampledata]$ ls Tanalysis Tflank_checking_125.fasta Tflank_checking_125.map Tgenome.fasta Tparam Tpoly_125.fasta Tpoly_125.map
Here's the Tanalysis folder: [ar14g12@cyan52 Tanalysis]$ ls Tflank_checking_125.fasta.blast9 Tflank_checking_125.fasta.blast9_sd Tpoly_125.fasta Tpoly_125.fasta.polyAT Tpoly_125.map
Thank you, Anne
Hello Anne,
The project name (-O) cannot contain the character "_". Could you try instead of calling it tlex_exampledata, but tlexexampledata, for example?
Thank you and sorry about the inconveniences.
Hi,
Thanks for the reply but I dont think that's the problem.
This is my script: /local/software/perl/5.26.1/bin/perl $tlex3/tlex-open-v3.0.pl -O exampledata -T $data/TElist_example.txt -M $data/TEannotation_example.txt -G $data/genome_example.fa -R $data/fastq_files/example/example_1.fastq $data/fastq_files/example/example_2.fastq
I use -O exampledata but the output folder comes out as tlex_exampledata.
Hello,
The architecture of this software is quite sensitive to small things, and sometimes it can be tricky (sorry about this), what makes a bit difficult to understand what can go wrong. Could you give as the project name (-O) "example"? I am pretty sure it is something related with file names, the rest of it looks fine.
Let me know and, sorry about the inconveniences!
Hi,
Thank you for your response.
I've tried the project name (-O) "example" but it is still giving me the same output.
"my" variable @ref masks earlier declaration in same scope at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 2088. "my" variable $data masks earlier declaration in same scope at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 2089. "my" variable $wd masks earlier declaration in same scope at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 2102. mkdir: cannot create directory 'tlex_example': File exists
* T-lex release 3
Report the presence/absence of given sequence(s) in strain(s) *
and return their frequency
* Wed Sep 21 20:28:22 2022 *
Output directory: /mainfs/scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/RepeatMasker/tlex_example/
******************** Prepare the input data ********************
Simplify the fasta file of the reference sequences ...
******************** Tjunction analysis ********************
mkdir: cannot create directory 'tlex_example/Tanalysis': File exists Identification of TE insertions nested or flanked by repeats.... RepeatMasker version 4.1.3 Search Engine: NCBI/RMBLAST [ 2.11.0+ ]
Using Master RepeatMasker Database: /mainfs/scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/RepeatMasker/Libraries/RepeatMaskerLib.h5 Title : Dfam Version : 3.6 Date : 2022-04-12 Families : 19,025
Species "drosophila" is not known to RepeatMasker. There may not be any TE families defined in the libraries for this species/clade or there may be an error in the spelling. Please check your entry against the NCBI Taxonomy database and/or try using a broader clade or related species instead. The full list of species/clades defined in the library may be obtained using the famdb.py script.
mv: cannot stat 'tlex_example/Tflank_checking_125.fasta.out': No such file or directory mv: cannot stat 'tlex_example/Tflank_checking_125.fasta.masked': No such file or directory
Identification of TE insertions misannotated because of a longer Poly A/T tail....
Identification of TE insertions part of segmental duplications.... /mainfs/scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/RepeatMasker blat tlex_example/Tflank_checking_125.fasta tlex_example/Tgenome.fasta tlex_example/Tflank_checking_125.fasta.blast9 -out=blast9 Loaded 1260 letters in 10 sequences Query sequence 2L has size 23513712, it might take a while. Query sequence 2R has size 25286936, it might take a while. Query sequence 3L has size 28110227, it might take a while. Query sequence 3R has size 32079331, it might take a while. Query sequence X has size 23542271, it might take a while. Searched 137547960 bases in 7 sequences
Hello,
I am afraid there is some kind of problem with RepeatMasker here: Species "drosophila" is not known to RepeatMasker.
Are you using any other library for RepeatMasker? Or maybe the library by default which it was installed with? RepeatMasker should get "drosophila" as a species and I am afraid it crashes when it tries to mask the genome, and therefore, the files do not exist after that.
Let me know if it is solved with this RM issue.
Regards!
Hi,
I installed RepeatMasker-4.1.3 and RepBase27.02. I didn't have any errors downloading these so RepeatMasker should work.
output from RepeatMasker installation: Building FASTA version of RepeatMasker.lib .............................. Building RMBlast frozen libraries.. The program is installed with a the following repeat libraries: File: /mainfs/scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/RepeatMasker/Libraries /Dfam.h5 Database: Dfam Version: 3.6 Date: 2022-04-12
Dfam - A database of transposable element (TE) sequence alignments and HMMs.
Total consensus sequences: 19025 Total HMMs: 18987
Thanks
Hello,
Sorry for my late response.
I think it could be a matter of the newest version in RepeatMasker: https://github.com/rmhubley/RepeatMasker/issues/123
Apparently you'd need to specify not only taxa but also species. It would need to be specified also in the manual.
Hi,
Thank you for the respose.
I'm not really sure how to fix this if I'm using RepeatMasker through the T-lex3 pipeline. Do I need to re-install RepeatMasker?
Thanks
Hello,
Sorry for the late response. Taking into account this new issue of RepeatMasker I think you should specify the argument -s using "drosophila_flies_genus".
According to your command line: /local/software/perl/5.26.1/bin/perl $tlex3/tlex-open-v3.0.pl -O exampledata -s 'drosophila_flies_genus' -T $data/TElist_example.txt -M $data/TEannotation_example.txt -G $data/genome_example.fa -R $data/fastq_files/example/example_1.fastq $data/fastq_files/example/example_2.fastq
Hope it helps!
Hi,
Thank you for your help!
This seems to work, I got the following output but the Tresults file is blank: "my" variable @ref masks earlier declaration in same scope at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 2088. "my" variable $data masks earlier declaration in same scope at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 2089. "my" variable $wd masks earlier declaration in same scope at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 2102.
* T-lex release 3
Report the presence/absence of given sequence(s) in strain(s) *
and return their frequency
* Mon Oct 10 17:50:54 2022 *
Output directory: /mainfs/scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/RepeatMasker/tlex_example/
******************** Prepare the input data ********************
Simplify the fasta file of the reference sequences ...
******************** Tjunction analysis ********************
Identification of TE insertions nested or flanked by repeats.... RepeatMasker version 4.1.3 Search Engine: NCBI/RMBLAST [ 2.11.0+ ]
Using Master RepeatMasker Database: /mainfs/scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/RepeatMasker/Libraries/RepeatMaskerLib.h5 Title : Dfam Version : 3.6 Date : 2022-04-12 Families : 19,025
Species/Taxa Search:
Drosophila <flies,genus> [NCBI Taxonomy ID: 7215]
Lineage: root;cellular organisms;Eukaryota;Opisthokonta;Metazoa;
Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;
Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;
Dicondylia;Pterygota
Building species libraries in: /mainfs/scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/RepeatMasker/Libraries/CONS-Dfam_3.6/drosophila_flies_genus
Traceback (most recent call last):
File "/mainfs/scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/RepeatMasker/famdb.py", line 1841, in
analyzing file tlex_example/Tflank_checking_125.fasta
Checking for E. coli insertion elements identifying Simple Repeats in batch 1 of 1 identifying matches to drosophila_flies_genus sequences in batch 1 of 1 identifying Simple Repeats in batch 1 of 1 processing output: cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6 cycle 7 cycle 8 cycle 9 cycle 10 Generating output... masking done
Identification of TE insertions misannotated because of a longer Poly A/T tail....
Identification of TE insertions part of segmental duplications.... /mainfs/scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/RepeatMasker blat tlex_example/Tflank_checking_125.fasta tlex_example/Tgenome.fasta tlex_example/Tflank_checking_125.fasta.blast9 -out=blast9 Loaded 1260 letters in 10 sequences Query sequence 2L has size 23513712, it might take a while. Query sequence 2R has size 25286936, it might take a while. Query sequence 3L has size 28110227, it might take a while. Query sequence 3R has size 32079331, it might take a while. Query sequence X has size 23542271, it might take a while. Searched 137547960 bases in 7 sequences
*******************FILTER TEs starts at Mon Oct 10 17:51:58 2022*********************
The new TE list is stored in the file : /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/example/TElist_example.txt_filtered
******************** Launch Presence detection start at Mon Oct 10 17:51:58 2022 ********************
Parameters for the detection of the PRESENCE of the given sequence(s):
Strain data: /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/example/fastq_files/example/example_1.fastq readdir() attempted on invalid dirhandle DIR4 at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 1783. cat: tlex_example/Tpresence/*/detection/results: No such file or directory closedir() attempted on invalid dirhandle DIR4 at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 1810.
******************** Presence detection end at Mon Oct 10 17:51:59 2022 ************
******************** Launch Absence detection start at Mon Oct 10 17:51:59 2022********************
Parameters for the detection of the ABSENCE of the given sequence(s):
Length of the internal region = 0 Convert the TE coordinates ... Extract the TE flanked regions .... readdir() attempted on invalid dirhandle DIR at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 2585. cat: tlex_example/Tabsence/*/detection/results: No such file or directory
******************** Absence detection end at Mon Oct 10 17:51:59 2022 ************
TE list cleaned
mkdir Talign
PRESENCE ALIGNMENT
presence_detection directory does not exist !
/mainfs/scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/RepeatMasker/tlex_example
readdir() attempted on invalid dirhandle DIR at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 685,
mkdir Talign
mkdir: cannot create directory 'Talign': File exists
ABSENCE ALIGNMENT
Talign/ directory exists !
absence_detection directory does not exist !
readdir() attempted on invalid dirhandle DIR at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 736,
Multiple Alignment end at Mon Oct 10 17:51:59 2022 readline() on closed filehandle IN at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 3223. tlex_exampleINIT readline() on closed filehandle IN at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 3231. Use of uninitialized value $pooleddata in string at /scratch/ar14g12/PhD/tomato/TE/Tlex3/T-lex3/tlex-open-v3.0.pl line 571. strain frequency estimates using single strains... cleaning......
******************** T-lex finished successfully at Mon Oct 10 17:51:59 2022 ************
************************ Have a nice day! ************************
Hello,
Sorry, I was out of the office for few days. Could you take a screenshot of the results folder (i.e. example). Maybe it is better if you contact me by mail: maria.bogaerts-marquez@inrae.fr. I am afraid it is something about folder name or similar (this software is too sensitive about these things).
Is there a quick way to check all the pre-requisites were installed correctly and tlex3 will run properly?