Closed CristianRiccio closed 6 years ago
Hi,
The LTRpred()
function has an argument named Dfam.db
which can be specified as Dfam.db = "download"
in combination with specifying annotate = "Dfam"
. This way the Dfam database will be downloaded automatically.
This is all specified in the documentation ?LTRpred
.
Please use the GitHub issues for reporting actual program bugs and not for consultation on how to use the tool. Please either consult the documentation or if it is not specified there write me a personal message or email.
I will extend the documentation in the next months and am also about to write up the tool as a publication.
I hope this helps!
Cheers, Hajk
Hi, OK about the issue vs. documentation. I tried what you said but I got a problem:
LTRpred(c_elegans.PRJNA13758.WS263.genomic.fa',
+ output.path = 'annotation/', Dfam.db = 'download', annotate = 'Dfam')
vsearch v2.7.0_macos_x86_64, 16.0GB RAM, 8 cores
https://github.com/torognes/vsearch
No hmm files were specified, thus the internal HMM library will be used! See '/Users/user/Library/R/3.5/library/LTRpred/HMMs/hmm_*' for details.
No tRNA files were specified, thus the internal tRNA library will be used! See '/Users/user/Library/R/3.5/library/LTRpred/tRNAs/tRNA_library.fa' for details.
Folder 'annotation/' exists already and will be used...
Starting LTRpred analysis...
Step 1:
Run LTRharvest...
LTRharvest: Generating index file c_elegans_ltrharvest/c_elegans_index.fsa with gt suffixerator...
Running LTRharvest and writing results to c_elegans_ltrharvest...
LTRharvest analysis finished!
Step 2:
Generating index file c_elegans_ltrdigest/c_elegans_index_ltrdigest.fsa with suffixerator...
LTRdigest: Sort index file...
Running LTRdigest and writing results to c_elegans_ltrdigest...
LTRdigest analysis finished!
Step 3:
Import LTRdigest Predictions...
Input: c_elegans_ltrdigest/c_elegans_LTRdigestPrediction.gff -> Row Number: 2660
Remove 'NA' -> New Row Number: 2660
(1/8) Filtering for repeat regions has been finished.
(2/8) Filtering for LTR retrotransposons has been finished.
(3/8) Filtering for inverted repeats has been finished.
(4/8) Filtering for LTRs has been finished.
(5/8) Filtering for target site duplication has been finished.
(6/8) Filtering for primer binding site has been finished.
(7/8) Filtering for protein match has been finished.
(8/8) Filtering for RR tract has been finished.
Step 4:
Perform ORF Prediction...
usearch v10.0.240_i86osx32, 4.0Gb RAM (17.2Gb total), 8 cores
(C) Copyright 2013-17 Robert C. Edgar, all rights reserved.
http://drive5.com/usearch
License: my_email_address
00:00 8.8Mb 100.0% Working
WARNING: Input has lower-case masked sequences
Join ORF Prediction table: nrow(df) = 380 candidates.
unique(ID) = 380 candidates.
unique(orf.id) = 380 candidates.
Perform Dfam search....
Download Dfam database from http://dfam.org/web_download/Current_Release/Dfam.hmm.gz ...
trying URL 'http://dfam.org/web_download/Current_Release/Dfam.hmm.gz'
Content type 'application/octet-stream' length 239726414 bytes (228.6 MB)
==================================================
downloaded 228.6 MB
Download completed!
Prepare the Dfam.hmm database...
Error: File existence/permissions problem in trying to open HMM file /Users/user/Documents/project/3.
HMM file /Users/user/Documents/project/3 not found (nor an .h3m binary of it)
Error: hmmpress could not format the file /Users/user/Documents/project/4. Is hmmpress installed on your system and did the download process of the Dfam database work properly?
In addition: Warning message:
In system(paste0("hmmpress ", file.path(ws.wrap.path(output.folder), :
running command 'hmmpress /Users/user/Documents/project/3' had status 1
I checked that hmmpress
is installed:
hmmpress
Incorrect number of command line arguments.
Usage: hmmpress [-options]
To see more help on available options, do hmmpress -h
Dfam.hmm.gz is in the working directory. What else can I check? Had a look at the Dfam.hmm file decompressed and it looked alright.
Do you prefer if I start a new issue on this?
This is clearly a file permission problem. Do you have file writing rights on the server you are running LTRpred on? Your system doesn't allow you to format the Dfam database. Hence, the error message:
Error: File existence/permissions problem in trying to open HMM file /Users/user/Documents/project/3.
and
Error: hmmpress could not format the file /Users/user/Documents/project/4.
You can also download the Dfam database directly from http://dfam.org/web_download/Current_Release/Dfam.hmm.gz
and format it using hmmpress
. Then specify the path to the formatted Dfam database in the Dfam.db
argument.
I am working on my laptop and I am able to write files in that directory. I have downloaded the Dfam database, uncompressed it (hmmpress does not like the compressed version) and hmmpressed it:
hmmpress -f Dfam.hmm
Working... done.
Pressed and indexed 4150 HMMs (4150 names and 4150 accessions).
Models pressed into binary file: Dfam.hmm.h3m
SSI index for binary model file: Dfam.hmm.h3i
Profiles (MSV part) pressed into: Dfam.hmm.h3f
Profiles (remainder) pressed into: Dfam.hmm.h3p
LTRpred(genome.file = 'c_elegans.PRJNA13758.WS263.genomic.fa',
output.path = 'annotation/', Dfam.db = 'Dfam.hmm', annotate = 'Dfam')
Is the LTRpred
command correct? What do you mean by the formatted Dfam database? hmmpress produces 4 different files.
Hi,
Perfect. Yes, now using Dfam.db = 'Dfam.hmm', annotate = 'Dfam'
should work.
Let me know how it goes.
Cheers, Hajk
My bad for not reading the help of LTRpred
carefully. Dfam.db
needs to be the folder in which the database is, not the path including the filename. Explains my latest error: 'Dfam.hmm/Dfam.hmm not found'. Will try again with the folder name without the filename.
New error:
LTRpred(genome.file = 'c_elegans.PRJNA13758.WS263.genomic.fa',
+ output.path = 'annotation/', Dfam.db = '.', annotate = 'Dfam')
vsearch v2.7.0_macos_x86_64, 16.0GB RAM, 8 cores
https://github.com/torognes/vsearch
No hmm files were specified, thus the internal HMM library will be used! See '/Users/user/Library/R/3.5/library/LTRpred/HMMs/hmm_*' for details.
No tRNA files were specified, thus the internal tRNA library will be used! See '/Users/user/Library/R/3.5/library/LTRpred/tRNAs/tRNA_library.fa' for details.
Folder 'annotation/' exists already and will be used...
Starting LTRpred analysis...
Step 1:
Run LTRharvest...
LTRharvest: Generating index file c_elegans_ltrharvest/c_elegans_index.fsa with gt suffixerator...
Running LTRharvest and writing results to c_elegans_ltrharvest...
LTRharvest analysis finished!
Step 2:
Generating index file c_elegans_ltrdigest/c_elegans_index_ltrdigest.fsa with suffixerator...
LTRdigest: Sort index file...
Running LTRdigest and writing results to c_elegans_ltrdigest...
LTRdigest analysis finished!
Step 3:
Import LTRdigest Predictions...
Input: c_elegans_ltrdigest/c_elegans_LTRdigestPrediction.gff -> Row Number: 2660
Remove 'NA' -> New Row Number: 2660
(1/8) Filtering for repeat regions has been finished.
(2/8) Filtering for LTR retrotransposons has been finished.
(3/8) Filtering for inverted repeats has been finished.
(4/8) Filtering for LTRs has been finished.
(5/8) Filtering for target site duplication has been finished.
(6/8) Filtering for primer binding site has been finished.
(7/8) Filtering for protein match has been finished.
(8/8) Filtering for RR tract has been finished.
Step 4:
Perform ORF Prediction...
usearch v10.0.240_i86osx32, 4.0Gb RAM (17.2Gb total), 8 cores
(C) Copyright 2013-17 Robert C. Edgar, all rights reserved.
http://drive5.com/usearch
License: my_email_address
00:01 6.4Mb 2900:01 8.9Mb 100
WARNING: Input has lower-case masked sequences
Join ORF Prediction table: nrow(df) = 380 candidates.
unique(ID) = 380 candidates.
unique(orf.id) = 380 candidates.
Perform Dfam search....
Prepare the Dfam.hmm database...
Error: Looks like ./Dfam.hmm is already pressed (.h3i file present, anyway):
Delete old hmmpress indices first
Run Dfam scan...
Can't locate Dfamscan.pm in @INC (you may need to install the Dfamscan module) (@INC contains: /usr/local/lib/perl5/site_perl /Users/user/anaconda/envs/python3env/lib/perl5/site_perl/5.22.0/darwin-thread-multi-2level /Users/user/anaconda/envs/python3env/lib/perl5/site_perl/5.22.0 /Users/user/anaconda/envs/python3env/lib/perl5/5.22.0/darwin-thread-multi-2level /Users/user/anaconda/envs/python3env/lib/perl5/5.22.0 .) at /usr/local/bin/dfamscan.pl line 7.
BEGIN failed--compilation aborted at /usr/local/bin/dfamscan.pl line 7.
Finished Dfam scan!
A dfam query file has been generated and stored at/Users/user/Documents/project/c_elegans-ltrdigest_complete.fas_DfamAnnotation.out.
Error: The file '/Users/user/Documents/project/c_elegans-ltrdigest_complete.fas_DfamAnnotation.out' does not exist! Please check the correct path to the dfam.file.
I have downloaded dfamscan.pl
as described here https://hajkd.github.io/LTRpred/articles/Introduction.html
However, when I run this in the terminal:
perl /usr/local/bin/dfamscan.pl -help
I get the following error:
perl /usr/local/bin/dfamscan.pl -help
Can't locate Dfamscan.pm in @INC (you may need to install the Dfamscan module) (@INC contains: /usr/local/lib/perl5/site_perl /Users/user/anaconda/envs/python3env/lib/perl5/site_perl/5.22.0/darwin-thread-multi-2level /Users/user/anaconda/envs/python3env/lib/perl5/site_perl/5.22.0 /Users/user/anaconda/envs/python3env/lib/perl5/5.22.0/darwin-thread-multi-2level /Users/user/anaconda/envs/python3env/lib/perl5/5.22.0 .) at /usr/local/bin/dfamscan.pl line 7.
BEGIN failed--compilation aborted at /usr/local/bin/dfamscan.pl line 7
Did you install HMMer as described in the Introduction? The dfamscan.pl
uses a specific HMMer version that might have the missing Perl module. You can find also more details here: http://www.dfam.org/web_download/Tools/README.txt
Since this is a Dfam issue and clearly some dependency module is missing and wasn't installed on your machine, I will need to do some research as well to find out what the issue could be. It does work seamlessly on my side.
I installed hmmer using conda. Let me try the Dfam version.
I uninstalled my conda hmmer. I then followed the instructions to install hmmer
from the Dfam website. But I still get this error:
Can't locate Dfamscan.pm in @INC (you may need to install the Dfamscan module) (@INC contains: /usr/local/lib/perl5/site_perl /Users/user/anaconda/envs/python3env/lib/perl5/site_perl/5.22.0/darwin-thread-multi-2level /Users/user/anaconda/envs/python3env/lib/perl5/site_perl/5.22.0 /Users/user/anaconda/envs/python3env/lib/perl5/5.22.0/darwin-thread-multi-2level /Users/user/anaconda/envs/python3env/lib/perl5/5.22.0 .) at /usr/local/bin/dfamscan.pl line 7.
BEGIN failed--compilation aborted at /usr/local/bin/dfamscan.pl line 7.
Finished Dfam scan!
Also, my hmmpress is now here: /usr/local/bin/
. See:
hmmalign
hmmbuild
hmmc2
hmmconvert
hmmemit
hmmerfm-exactmatch
hmmfetch
hmmlogo
hmmpgmd
hmmpress
hmmscan
hmmsearch
hmmsim
hmmstat
jackhmmer
makehmmerdb
nhmmer
nhmmscan
phmmer
Hi @CristianRiccio
It seems that if you remove the line 7 in the file dfamscan.pl
this should resolve the problem.
use Dfamscan;
I don't know why Dfam doesn't provide a file with the Dfamscan class/file.
In any case, I am now considering to include this script (modified) into LTRpred to avoid future issues.
Thank you so much for pointing all these things out to me. I will also make sure to extend the documentation to make it easier to use LTRpred :)
Cheers, Hajk
I did what you said. I have got this error now:
Undefined subroutine &Dfamscan::filter_covered_hits called at /usr/local/bin/dfamscan.pl line 49.
I am trying to understand a bit of Perl. Is Dfamscan a package (like in R)/module (like in Python) of Perl?
I have the same Error
$perl dfamscan.pl -help
Can't locate Dfamscan.pm in @INC (you may need to install the Dfamscan module)
Hi @bioinfo-Kacst,
Many thanks for letting me know.
Have you tried installing all dfamscan.pl
tool dependencies as specified here: http://www.dfam.org/web_download/Tools/README.txt ?
Since this seems to be a greater issue I am now planning to build a docker container around LTRpred
to enable easier usability.
I will keep you posted.
Cheers, Hajk
I also just found that you can use the Bioconda package management system to install Dfam
so you might want to install Bioconda and run:
conda install dfam
Let me know if this works for you now?
Cheers, Hajk
Hi,
I've downloaded
dfamscan.pl
here:/usr/local/bin/dfamscan.pl
. Then, I tried to pull out the help but I got an error:perl /usr/local/bin/dfamscan.pl -help Can't locate Dfamscan.pm in @INC (you may need to install the Dfamscan module) (@INC contains: /usr/local/lib/perl5/site_perl /Users/user/anaconda/envs/python3env/lib/perl5/site_perl/5.22.0/darwin-thread-multi-2level /Users/user/anaconda/envs/python3env/lib/perl5/site_perl/5.22.0 /Users/user/anaconda/envs/python3env/lib/perl5/5.22.0/darwin-thread-multi-2level /Users/user/anaconda/envs/python3env/lib/perl5/5.22.0 .) at /usr/local/bin/dfamscan.pl line 7. BEGIN failed--compilation aborted at /usr/local/bin/dfamscan.pl line 7.
What isDfamscan.pm
? How do I download the DFAM database and make it available to LTRpred so that I can get better prediction and description of LTR retrotransposons? All my dfam columns are NAs in the results so far.I am working on Mac OS X.