Closed shjenkins94 closed 3 years ago
For MAKER, I suspect python 2.7.15 is a transitive dependency via augustus 3.3.3_pl526h0faeac2_5 -> biopython 1.76_py27h516909a_0, in which case bumping the maker version to get a more-recent augustus build (with a more-recent Python 3.x biopython build) should do the trick.
I guess
Species "anopheles" is not known to RepeatMasker. There may
not be any TE families defined in the libraries for this
species/clade or there may be an error in the spelling.
Please check your entry against the NCBI Taxonomy database
and/or try using a broader clade or related species instead.
The full list of species/clades defined in the library may be
obtained using the famdb.py script.
is related?
I think the dependency that requires python 2 is mir-prefer, but it looks like you figured that out in the pull request.
@bernt-matthias Are you perhaps using RepBase? I was wondering since I'm getting a slightly different issue with DFAM.
I tried using the newer code but kept getting "ERROR: Could not determine if RepBase is installed" so I started poking around in MAKER's code. It looks like MAKER uses a perl module GI.pm that gets installed in $CONDA_PREFIX/lib/GI.pm. In that I found:
#--make sure repbase is installed
if($CTL_OPT{model_org} and !defined($ENV{'LIBDIR'})){
my $exe = Cwd::abs_path($CTL_OPT{RepeatMasker});
my ($lib) = $exe =~ /(.*\/)RepeatMasker$/;
die "ERROR: Could not determine if RepBase is installed\n" if(! $lib);
$lib .= "../share/RepeatMasker/Libraries/RepeatMaskerLib.embl";
die "ERROR: Could not determine if RepBase is installed\n" if(! -f $lib);
open(my $IN, "< $lib");
my $rb_flag;
for(my $i = 0; $i < 20; $i++){
my $line = <$IN>;
if($line =~ /RELEASE \d+(\-min)?\;/){
$rb_flag = ($1 && $1 eq '-min') ? 0 : 1;
last;
}
}
close($IN);
if(! $rb_flag){
warn "WARNING: RepBase is not installed for RepeatMasker. This limits\n".
"RepeatMasker's functionality and makes the model_org option in the\n".
"control files virtually meaningless. MAKER will now reconfigure\n".
"for simple repeat masking only.\n";
$CTL_OPT{model_org} = 'simple';
}
}
So it seems like MAKER is failing because RepeatMasker doesn't create RepeatMaskerLib.embl anymore and this part kills MAKER if it doesn't exist.
$lib .= "../share/RepeatMasker/Libraries/RepeatMaskerLib.embl"; die "ERROR: Could not determine if RepBase is installed\n" if(! -f $lib);
Thanks for digging into this. This seems to be the case https://github.com/galaxyproject/tools-iuc/blob/db75a8489a1f61ea30abe9b91f6febac8b34204f/tools/maker/maker.xml#L394
The repeatmasker==4.1.1 bioconda package generates a RepeatMaskerLib.h5 symlink:
$ singularity exec quay.io_biocontainers_maker_2.31.11--pl526h61907ee_0-2020-12-02-c14814e811b3.sif ls -l /usr/local/share/RepeatMasker/Libraries/
total 2166530
-rwxrwxr-x 1 root root 25283 Nov 23 14:26 Artefacts.embl
-rw-rw-r-- 1 root root 2011886880 Nov 23 14:26 Dfam.h5
-rw-rw-r-- 1 root root 214 Nov 23 14:26 README.meta
-rwxrwxr-x 1 root root 22475384 Nov 23 14:26 RepeatAnnotationData.pm
-rw-rw-r-- 1 root root 10955446 Nov 23 14:27 RepeatMasker.lib
lrwxrwxrwx 1 root root 7 Dec 2 17:31 RepeatMaskerLib.h5 -> Dfam.h5
-rw-rw-r-- 1 root root 674815 Nov 23 14:27 RepeatMasker.lib.nhr
-rw-rw-r-- 1 root root 83808 Dec 2 16:52 RepeatMasker.lib.nin
-rw-rw-r-- 1 root root 3095721 Nov 23 14:27 RepeatMasker.lib.nsq
-rw-rw-r-- 1 root root 17979984 Nov 23 14:26 RepeatPeps.lib
-rw-rw-r-- 1 root root 2931407 Nov 23 14:27 RepeatPeps.lib.phr
-rw-rw-r-- 1 root root 144448 Dec 2 16:52 RepeatPeps.lib.pin
-rw-rw-r-- 1 root root 16168295 Nov 23 14:27 RepeatPeps.lib.psq
-rw-rw-r-- 1 root root 5550 Nov 23 14:26 RepeatPeps.readme
-rw-rw-r-- 1 root root 18752245 Nov 23 14:26 RMRBMeta.embl
-rw-rw-r-- 1 root root 113343436 Nov 23 14:26 taxonomy.dat
MAKER is hard-coded to check for RepeatMaskerLib.embl---unless the LIBDIR environment variable is set (this was previously REPEATMASKER_LIB_DIR in both the bioconda maker & repeatmasker <= 4.1; changed to LIBDIR in this commit to align with the upstream RepeatMasker 4.1.x). So currently, the LIBDIR environment variable will have to be specified (although I guess another alternative would be to update recipes/maker/repeatmasker_check.patch to simply remove the code block?)
I tried replacing
$lib .= "../share/RepeatMasker/Libraries/RepeatMaskerLib.embl";
die "ERROR: Could not determine if RepBase is installed\n" if(! -f $lib);
with
$lib .= "Libraries/RepeatMaskerLib.h5";
die "ERROR: Could not determine if RepBase is installed\n" if(! -f $lib);
Which seems to work, but I'm not sure how robust that fix is.
I guess another problem is checking for RepBase in the first place. Since RepBase isn't free anymore, there are probably a lot of people like me who use Dfam instead.
However, then MAKER would attempt to read the RepeatMaskerLib.h5 file as if it were a text file (rather than an HDF5 file) to get some version information:
open(my $IN, "< $lib");
my $rb_flag;
for(my $i = 0; $i < 20; $i++){
my $line = <$IN>;
if($line =~ /RELEASE \d+(\-min)?\;/){
$rb_flag = ($1 && $1 eq '-min') ? 0 : 1;
last;
}
}
If it's undesirable to assume that LIBDIR is set, then I suppose the entire code block should be removed or deactivated.
Somehow I do not like this symlink. This makes the user assume that RepeatMaskerLib is used, but actually its DFAM. Also really bad from the view point of reproducibility.
I guess for the Galaxy tool we should switch to DFAM .. lets just drop non-free components (nobody needs them if there are suitable free alternatives)... what to you think @bgruening?
The Galaxy tool currently uses 2.31.10 and does not work at the moment .. if I get it right the container broke due to the repeatmasker update. Wondering if we could fix this first, e.g. by pinning the repeatmasker requirement. Then we would have a working 2.31.10 container again. Alternatively we could create a folder for the 2.31.10. Asking because I could imagine that we are not that fast to update to the most recent maker version soon.
I guess for the Galaxy tool we should switch to DFAM .. lets just drop non-free components (nobody needs them if there are suitable free alternatives)... what to you think @bgruening?
Yes I think so as well. :( ping @abretaud
The Galaxy tool currently uses 2.31.10 and does not work at the moment .. if I get it right the container broke due to the repeatmasker update. Wondering if we could fix this first, e.g. by pinning the repeatmasker requirement. Then we would have a working 2.31.10 container again. Alternatively we could create a folder for the 2.31.10. Asking because I could imagine that we are not that fast to update to the most recent maker version soon.
I'm ok with both ways. What every works for you. But I guess we should add a test if possible as way, so that the container fails immediatly.
One possible workaround for the RepeatMasker problem is to construct a repeat library and specify rm_lib instead of model_org. If model_org isn't defined in the control options then MAKER doesn't check if RepBase is installed.
And looking at the code MAKER uses to run RepeatMasker,
my $command = "cd $tmp; $RepeatMasker";
if ($rmlib) {
$command .= " $q_file -dir $dir -pa $cpus -lib $rmlib";
}
elsif($species eq 'simple'){
my $lib = "$tmp/simple.lib";
if(!-f $lib){
(my $tFH, $t_file) = tempfile(DIR => $tmp);
print $tFH ">(N)n#Dummy_repeat \@root [S:25]\nnnnnnnnnnnnnnnnnnn\n";
close($tFH);
File::Copy::move($t_file, $lib);
}
$command .= " $q_file -dir $dir -pa $cpus -lib $lib";
}
else {
$command .= " $q_file -species $species -dir $dir -pa $cpus";
}
$command .= " -nolow" if defined($no_low);
The default is to run with a custom RepeatMasker library if it exists, then to run with a simple repeat library if model_org is "simple," then to run with the species model_org.
It seems like if both model_org and rm_lib are specified then RepeatMasker will run twice for each contig.
hi, my soft version is Python 3.6.10 and MAKER version 3.01.03 and RepeatMasker version 4.1.2-p1 , but i also have the same question, do you know the reason? Thank you!
Species "all" is not known to RepeatMasker. There may not be any TE families defined in the libraries for this species/clade or there may be an error in the spelling. Please check your entry against the NCBI Taxonomy database and/or try using a broader clade or related species instead. The full list of species/clades defined in the library may be obtained using the famdb.py script.
ERROR: RepeatMasker failed --> rank=NA, hostname=localhost.localdomain ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:tig00000001_pilon
ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:tig00000001_pilon
examining contents of the fasta file and run log
The newest version of RepeatMasker 4.1.1 uses h5py and requires python 3, but this is not listed in the dependencies.
Since MAKER requires python 2 and doesn't specify which version of RepeatMasker to install, It downloads the newest version. It seems like the main conflict is because of the famdb.py script that was added to RepeatMasker. It has the shebang
#!/usr/bin/env python3
and running it with python 2 doesn't work because of grammar issues.