Closed MikeSanJose closed 1 month ago
Hi,
Are you able to use the preconfigured containers at all? I just updated the main container stored here: https://hub.docker.com/repository/docker/tobybaril/earlgrey_dfam3.7/general
This doesn't require conda and is completely ready to go
Hi Toby,
I tried using this updated preconfigured container (tobybaril/earlgrey_dfam3.8), and it mostly works.
However i did get a few warning/error messages
) (
( ) )
) ( (
_______)_
.-'---------|
( C|/\/\/\/\/|
'-./\/\/\/\/|
'_________'
'-------'
<<< Resolving Overlapping Repeats >>>
Warning message:
Failed to locate timezone database
) (
( ) )
) ( (
_______)_
.-'---------|
( C|/\/\/\/\/|
'-./\/\/\/\/|
'_________'
'-------'
<<< Generating Summary Plots >>>
Indexing genome
Traceback (most recent call last):
File "/usr/local/share/earlgrey-4.4.5-0/scripts//divergenceCalc/divergence_calc.py", line 195, in <module>
file_check(args.repeat_library, args.in_gff, args.genome, args.out_gff, args.temp_dir)
File "/usr/local/share/earlgrey-4.4.5-0/scripts//divergenceCalc/divergence_calc.py", line 40, in file_check
subprocess.run(["samtools","faidx",genome], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
File "/usr/local/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/local/lib/python3.9/subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/lib/python3.9/subprocess.py", line 1837, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'samtools'
Thanks for the update! The first error can be safely ignored, you just won't get the timer for an Earl Grey run in the docker container.
For the second one, this is an issue in the bioconda package, I've updated the recipe and am waiting for them to merge the change. After this, I need to rebuild the docker container and release an updated version. I'll get this done ASAP and keep you updated, hopefully I can get it done in the next couple of days.
Cheers!
Hi,
I have rebuilt and committed the new containers for bnoth Dfam 3.7 and Dfam 3.8: https://hub.docker.com/repository/docker/tobybaril/earlgrey_dfam3.8/general
These should now work, but please let me know if you run into any issues.
Of note, the Dfam 3.8 containers are only configured with the root partition of Dfam3.8, which can negatively impact classification of de novo repeats with RepeatClassifier, depending on what lineages you are annotating. I would recommend checking and sourcing the appropriate partitions and reconfiguring RepeatMasker with these (in the container, Libraries are found in /usr/local/share/RepeatMasker/Libraries/famdb/
), after new ones are added you need to run perl ./configure
from /usr/local/share/RepeatMasker/
and follow the config instructions (trf is found at /usr/local/bin/trf
, choose rmblast for config and you can also just press enter without resupplying tool paths).
EDIT: You can find out which partitions are of use to you here: https://www.dfam.org/releases/Dfam_3.8/families/FamDB/
FamDB Format 1.0 Partitions:
The FamDB HDF5 database format now supports database partitioning by taxanomic groups.
This allows users to download only the portion(s) of Dfam that they need to conduct
their work while still providing all the features the famdb.py query tool. At a minimum
the root partition must be downloaded however any number of additional partitions
may also be present.
The taxonomic layout of the partitions in Dfam 3.8 is as follows:
Partition 0 [dfam38_full.0.h5]: root
Mammalia, Amoebozoa, Bacteria <bacteria>, Choanoflagellata,
Rhodophyta, Haptista, Metamonada, Fungi, Sar, Placozoa,
Ctenophora <comb jellies>, Filasterea, Spiralia, Discoba,
Cnidaria, Porifera, Viruses
Partition 1 [dfam38_full.1.h5]: Obtectomera
Partition 2 [dfam38_full.2.h5]: Euteleosteomorpha
Partition 3 [dfam38_full.3.h5]: Sarcopterygii
Sauropsida, Coelacanthimorpha, Amphibia, Dipnomorpha
Partition 4 [dfam38_full.4.h5]: Diptera
Partition 5 [dfam38_full.5.h5]: Viridiplantae
Partition 6 [dfam38_full.6.h5]: Deuterostomia
Chondrichthyes, Hemichordata, Cladistia, Holostei, Tunicata,
Cephalochordata, Cyclostomata <vertebrates>, Osteoglossocephala,
Otomorpha, Elopocephalai, Echinodermata, Chondrostei
Partition 7 [dfam38_full.7.h5]: Hymenoptera
Partition 8 [dfam38_full.8.h5]: Ecdysozoa
Nematoda, Gelechioidea, Yponomeutoidea, Incurvarioidea,
Chelicerata, Collembola, Polyneoptera, Tineoidea, Apoditrysia,
Monocondylia, Strepsiptera, Palaeoptera, Neuropterida, Crustacea,
Coleoptera, Siphonaptera, Trichoptera, Paraneoptera, Myriapoda,
Scalidophora
This new format is compatible with FamDB tool v1.0.1, and RepeatMasker 4.1.6.
For more information see the project page at: https://github.com/Dfam-consortium/FamDB
closing due to lack of activity - feel free to reopen or initiate a new issue if more help is needed!
Hi sorry it took a while to get to this but I tried running my genomes through the EarlGrey singularity container with Dfam 3.7 libraries.
It got through LTR step but after i am getting errors for missing packages
) (
( ) )
) ( (
_______)_
.-'---------|
( C|/\/\/\/\/|
'-./\/\/\/\/|
'_________'
'-------'
<<< Generating GFF Files >>>
Can't locate CrossmatchSearchEngine.pm in @INC (you may need to install the CrossmatchSearchEngine module) (@INC contains: /usr/local/bin/../ /usr/local/bin /usr/local/lib/perl5/5.32/site_perl /usr/local/lib/perl5/site_perl /usr/local/lib/perl5/5.32/vendor_perl /usr/local/lib/perl5/vendor_perl /usr/local/lib/perl5/5.32/core_perl /usr/local/lib/perl5/core_perl .) at /usr/local/bin/rmOutToGFF3.pl line 76.
BEGIN failed--compilation aborted at /usr/local/bin/rmOutToGFF3.pl line 76.
) (
( ) )
) ( (
_______)_
.-'---------|
( C|/\/\/\/\/|
'-./\/\/\/\/|
'_________'
'-------'
<<< Running RepeatCraft >>>
Step 1: Reformating GFF...
Parsing LTR_FINDER GFF...
Step 2: Labelling short TEs...
Missing mapfile, use unite size for all TEs except simple repeat, low complexity and satellite.
Step 3: Labelling LTR groups...
Updated LTR.gff with LTRgroup attribute to:ltrfinder_reformat.gffStep 4: Labelling TE groups...(loose mode)
Step 5: Merging GFF records by labels...
Step 6: Writing stat file..Removing tmp files...
Done
Traceback (most recent call last):
File "/usr/local/share/earlgrey-4.5.0-0/scripts//repeatCraft/repeatcraft.py", line 187, in <module>
rcStatm.rcstat(rclabelp=outputnamelabel,rmergep=outputnamemerge,outfile= statfname, ltrgroup = True)
File "/usr/local/share/earlgrey-4.5.0-0/scripts/repeatCraft/helper/rcStatm.py", line 56, in rcstat
next(f)
StopIteration
Each step after this step has an error.
Thanks for the update!
I think these are easy fixes on my end - I'll rebuild + test the container and let you know when it is updated.
What are the specs of the HPC that this is being run on (incl. OS)? This might help to narrow down the specifics here, even though the container should be the same...Also, this isn't running in rootless mode on the HPC is it? Sometimes that can cause issues with PERL5LIB
I've pushed new containers for both 3.7 and 3.8 under the tags latest
and v4.5.0
respectively. You can find these in Docker under tobybaril/earlgrey_dfam3.7:[tag]
and tobybaril/earlgrey_dfam3.8:[tag]
. I've pushed, then pulled and tested and both are error-free on our linux and mac systems. Let me know if you are still experiencing issues and I can try some further troubleshooting.
Hi Toby thanks for all the help.
I am running this on a Cray CS500 Linux Cluster with 2.40GHz Xeon Platinum 8260 nodes running Rocky 9.1 linux distribution. I'll try the new containers and let you know.
Thanks Again
Hi Toby. I tested the singularity container found in tobybaril/earlgrey_dfam3.7:latest and it seems to work!
Thanks for all the help.
Hi!
Thanks for letting me know - glad I could help!
Hi
I am trying to set up the singularity container for EarlGrey, but after building the container, I don't think there is a conda installed in the singularity container.
I built the container using the readme.md commands.
The container built fine but when i tried using the subsequent commands.
after the first line
eval "$(/anaconda3/bin/conda shell.bash hook)"
I get
bash: /anaconda3/bin/conda: No such file or directory
Any help would be appreciated.
Thanks