TobyBaril / EarlGrey

Earl Grey: A fully automated TE curation and annotation pipeline
Other
139 stars 20 forks source link

Singularity container initial setup not working #138

Closed MikeSanJose closed 1 month ago

MikeSanJose commented 1 month ago

Hi

I am trying to set up the singularity container for EarlGrey, but after building the container, I don't think there is a conda installed in the singularity container.

I built the container using the readme.md commands.

# Build the image from the Docker image
singularity build earlgrey.sif docker://tobybaril/earlgrey

# Run the sandbox
singularity shell -C -H $(pwd):/work --writable-tmpfs -u earlgrey.sif

The container built fine but when i tried using the subsequent commands.

eval "$(/anaconda3/bin/conda shell.bash  hook)"
conda env create -f /home/user/EarlGrey/earlGrey.yml
conda activate earlGrey
Rscript /home/user/EarlGrey/scripts/install_r_packages.R

after the first line eval "$(/anaconda3/bin/conda shell.bash hook)"

I get bash: /anaconda3/bin/conda: No such file or directory

Any help would be appreciated.

Thanks

TobyBaril commented 1 month ago

Hi,

Are you able to use the preconfigured containers at all? I just updated the main container stored here: https://hub.docker.com/repository/docker/tobybaril/earlgrey_dfam3.7/general

This doesn't require conda and is completely ready to go

MikeSanJose commented 1 month ago

Hi Toby,

I tried using this updated preconfigured container (tobybaril/earlgrey_dfam3.8), and it mostly works.

However i did get a few warning/error messages

              )  (
             (   ) )
             ) ( (
           _______)_
        .-'---------|  
       ( C|/\/\/\/\/|
        '-./\/\/\/\/|
         '_________'
          '-------'
        <<< Resolving Overlapping Repeats >>>
Warning message:
Failed to locate timezone database
              )  (
             (   ) )
             ) ( (
           _______)_
        .-'---------|  
       ( C|/\/\/\/\/|
        '-./\/\/\/\/|
         '_________'
          '-------'
        <<< Generating Summary Plots >>>

Indexing genome
Traceback (most recent call last):
  File "/usr/local/share/earlgrey-4.4.5-0/scripts//divergenceCalc/divergence_calc.py", line 195, in <module>
    file_check(args.repeat_library, args.in_gff, args.genome, args.out_gff, args.temp_dir)
  File "/usr/local/share/earlgrey-4.4.5-0/scripts//divergenceCalc/divergence_calc.py", line 40, in file_check
    subprocess.run(["samtools","faidx",genome], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
  File "/usr/local/lib/python3.9/subprocess.py", line 505, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/local/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/local/lib/python3.9/subprocess.py", line 1837, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'samtools'
TobyBaril commented 1 month ago

Thanks for the update! The first error can be safely ignored, you just won't get the timer for an Earl Grey run in the docker container.

For the second one, this is an issue in the bioconda package, I've updated the recipe and am waiting for them to merge the change. After this, I need to rebuild the docker container and release an updated version. I'll get this done ASAP and keep you updated, hopefully I can get it done in the next couple of days.

Cheers!

TobyBaril commented 1 month ago

Hi,

I have rebuilt and committed the new containers for bnoth Dfam 3.7 and Dfam 3.8: https://hub.docker.com/repository/docker/tobybaril/earlgrey_dfam3.8/general

These should now work, but please let me know if you run into any issues.

Of note, the Dfam 3.8 containers are only configured with the root partition of Dfam3.8, which can negatively impact classification of de novo repeats with RepeatClassifier, depending on what lineages you are annotating. I would recommend checking and sourcing the appropriate partitions and reconfiguring RepeatMasker with these (in the container, Libraries are found in /usr/local/share/RepeatMasker/Libraries/famdb/), after new ones are added you need to run perl ./configure from /usr/local/share/RepeatMasker/ and follow the config instructions (trf is found at /usr/local/bin/trf, choose rmblast for config and you can also just press enter without resupplying tool paths).

EDIT: You can find out which partitions are of use to you here: https://www.dfam.org/releases/Dfam_3.8/families/FamDB/

FamDB Format 1.0 Partitions:

    The FamDB HDF5 database format now supports database partitioning by taxanomic groups.
    This allows users to download only the portion(s) of Dfam that they need to conduct
    their work while still providing all the features the famdb.py query tool.  At a minimum
    the root partition must be downloaded however any number of additional partitions
    may also be present.  

    The taxonomic layout of the partitions in Dfam 3.8 is as follows:

          Partition 0 [dfam38_full.0.h5]: root 
                         Mammalia, Amoebozoa, Bacteria <bacteria>, Choanoflagellata, 
                         Rhodophyta, Haptista, Metamonada, Fungi, Sar, Placozoa, 
                         Ctenophora <comb jellies>, Filasterea, Spiralia, Discoba, 
                         Cnidaria, Porifera, Viruses
          Partition 1 [dfam38_full.1.h5]: Obtectomera 
          Partition 2 [dfam38_full.2.h5]: Euteleosteomorpha 
          Partition 3 [dfam38_full.3.h5]: Sarcopterygii 
                         Sauropsida, Coelacanthimorpha, Amphibia, Dipnomorpha
          Partition 4 [dfam38_full.4.h5]: Diptera 
          Partition 5 [dfam38_full.5.h5]: Viridiplantae 
          Partition 6 [dfam38_full.6.h5]: Deuterostomia 
                         Chondrichthyes, Hemichordata, Cladistia, Holostei, Tunicata, 
                         Cephalochordata, Cyclostomata <vertebrates>, Osteoglossocephala, 
                         Otomorpha, Elopocephalai, Echinodermata, Chondrostei
          Partition 7 [dfam38_full.7.h5]: Hymenoptera 
          Partition 8 [dfam38_full.8.h5]: Ecdysozoa 
                         Nematoda, Gelechioidea, Yponomeutoidea, Incurvarioidea, 
                         Chelicerata, Collembola, Polyneoptera, Tineoidea, Apoditrysia, 
                         Monocondylia, Strepsiptera, Palaeoptera, Neuropterida, Crustacea, 
                         Coleoptera, Siphonaptera, Trichoptera, Paraneoptera, Myriapoda, 
                         Scalidophora

   This new format is compatible with FamDB tool v1.0.1, and RepeatMasker 4.1.6.

   For more information see the project page at: https://github.com/Dfam-consortium/FamDB
TobyBaril commented 1 month ago

closing due to lack of activity - feel free to reopen or initiate a new issue if more help is needed!

MikeSanJose commented 1 month ago

Hi sorry it took a while to get to this but I tried running my genomes through the EarlGrey singularity container with Dfam 3.7 libraries.

It got through LTR step but after i am getting errors for missing packages

                  )  (
             (   ) )
             ) ( (
           _______)_
        .-'---------|  
       ( C|/\/\/\/\/|
        '-./\/\/\/\/|
         '_________'
          '-------'
        <<< Generating GFF Files >>>
Can't locate CrossmatchSearchEngine.pm in @INC (you may need to install the CrossmatchSearchEngine module) (@INC contains: /usr/local/bin/../ /usr/local/bin /usr/local/lib/perl5/5.32/site_perl /usr/local/lib/perl5/site_perl /usr/local/lib/perl5/5.32/vendor_perl /usr/local/lib/perl5/vendor_perl /usr/local/lib/perl5/5.32/core_perl /usr/local/lib/perl5/core_perl .) at /usr/local/bin/rmOutToGFF3.pl line 76.
BEGIN failed--compilation aborted at /usr/local/bin/rmOutToGFF3.pl line 76.

              )  (
             (   ) )
             ) ( (
           _______)_
        .-'---------|  
       ( C|/\/\/\/\/|
        '-./\/\/\/\/|
         '_________'
          '-------'
        <<< Running RepeatCraft >>>
Step 1: Reformating GFF...
       Parsing LTR_FINDER GFF...
Step 2: Labelling short TEs...
Missing mapfile, use unite size for all TEs except simple repeat, low complexity and satellite.
Step 3: Labelling LTR groups...
Updated LTR.gff with LTRgroup attribute to:ltrfinder_reformat.gffStep 4: Labelling TE groups...(loose mode)
Step 5: Merging GFF records by labels...
Step 6: Writing stat file..Removing tmp files...
Done
Traceback (most recent call last):
  File "/usr/local/share/earlgrey-4.5.0-0/scripts//repeatCraft/repeatcraft.py", line 187, in <module>
    rcStatm.rcstat(rclabelp=outputnamelabel,rmergep=outputnamemerge,outfile= statfname, ltrgroup = True)
  File "/usr/local/share/earlgrey-4.5.0-0/scripts/repeatCraft/helper/rcStatm.py", line 56, in rcstat
    next(f)
StopIteration

Each step after this step has an error.

TobyBaril commented 1 month ago

Thanks for the update!

I think these are easy fixes on my end - I'll rebuild + test the container and let you know when it is updated.

What are the specs of the HPC that this is being run on (incl. OS)? This might help to narrow down the specifics here, even though the container should be the same...Also, this isn't running in rootless mode on the HPC is it? Sometimes that can cause issues with PERL5LIB

TobyBaril commented 1 month ago

I've pushed new containers for both 3.7 and 3.8 under the tags latest and v4.5.0 respectively. You can find these in Docker under tobybaril/earlgrey_dfam3.7:[tag] and tobybaril/earlgrey_dfam3.8:[tag]. I've pushed, then pulled and tested and both are error-free on our linux and mac systems. Let me know if you are still experiencing issues and I can try some further troubleshooting.

MikeSanJose commented 1 month ago

Hi Toby thanks for all the help.

I am running this on a Cray CS500 Linux Cluster with 2.40GHz Xeon Platinum 8260 nodes running Rocky 9.1 linux distribution. I'll try the new containers and let you know.

Thanks Again

MikeSanJose commented 1 month ago

Hi Toby. I tested the singularity container found in tobybaril/earlgrey_dfam3.7:latest and it seems to work!

Thanks for all the help.

TobyBaril commented 1 month ago

Hi!

Thanks for letting me know - glad I could help!