TobyBaril / EarlGrey

Earl Grey: A fully automated TE curation and annotation pipeline
Other
139 stars 20 forks source link

Error at RepeatMerger stage #13

Closed J-Calvelo closed 2 years ago

J-Calvelo commented 2 years ago

Hello, I'm running into what I think is an instalation and/or compatibility issue. The pipeline continues up to the generating the *_mergedRepeats stage but fails to geneate the GFF files with the message

Can't locate CrossmatchSearchEngine.pm in @INC (you may need to install the CrossmatchSearchEngine module) (@INC contains: /home/amanda/miniconda3/envs/earlGrey/bin/../ /home/amanda/miniconda3/envs/earlGrey/bin /home/amanda/miniconda3/envs/earlGrey/lib/site_perl/5.26.2/x86_64-linux-thread-multi /home/amanda/miniconda3/envs/earlGrey/lib/site_perl/5.26.2 /home/amanda/miniconda3/envs/earlGrey/lib/5.26.2/x86_64-linux-thread-multi /home/amanda/miniconda3/envs/earlGrey/lib/5.26.2 .) at /home/amanda/miniconda3/envs/earlGrey/bin/rmOutToGFF3.pl line 76.

Latter on the pipeline continues regardles up to the "Resolving Overlaping Repeats" where it also reports that:

Error in library(GenomicRanges) : 
  there is no package called ‘GenomicRanges’
Execution halted

I'm ataching the last 300 lines of the log file EarlGrey.log

I'm not sure how to solve it. I'm using RepeatMasker v4.1.2-p1 and RepeatModeler v2.0.3, both installed with conda as part of the environment earlGrey. And were installed before running the configure file. Thanks

TobyBaril commented 2 years ago

Hi,

The first issue Can't locate CrossmatchSearchEngine.pm in @INC is due to the conda installation of RepeatMasker, as this should be in the RepeatMasker folder of the RepeatMasker installation. In the documentation on the Earl Grey README, we recommend against installing RepeatModeler and RepeatMasker using Conda, as there are known dependency issues with the packages. I would highly recommend installing RepeatMasker using the guidance on the RepeatMasker website, or using one of the methods from the installation page here.

If you would like to keep the conda installation of RepeatMasker, check the environment share folder (something like /home/amanda/miniconda3/envs/earlGrey/share/RepeatMasker/) and see if this has CrossmatchSearchEngine.pm present. If it is present, copy this file to /home/amanda/miniconda3/envs/earlGrey/bin (cp CrossmatchSearchEngine.pm /home/amanda/miniconda3/envs/earlGrey/bin) and now the file should be found by perl as this directory is in @INC. There are other ways to adjust @INC, but it is quite hard to mess with this inside the conda environment.

For the second issue, try cloning the EarlGrey repository again, and running the ./configure script, as GenomicRanges should be installed as part of configuring the environment. You can check the earlGrey.yml in your download is up to date as it should include a line for GenomicRanges: bioconductor-genomicranges=1.46.1=r41h5c21468_0. To reconfigure, you will need to first remove the earlGrey environment before running the ./configure script of Earl Grey. You can do this by: conda env remove -n earlGrey.

I hope this helps!

J-Calvelo commented 2 years ago

Hello, I tired with a new installation of RepeatMasker and Modeller but I keep running with issues. For some reasson not all packages included in the earlGrey.yml are being installed. So far I can confirm GenomicRanges, r-ape and Emboss

J-Calvelo commented 2 years ago

I've managed to run a test dataset by installing _libgcc_mutex, _openmp_mutex, argon2-cffi, attrs, atk-1.0, bleach, biopython, blosc, bedtools, bwidget one by one, in addition to GenomicRanges, r-ape and Emboss. I'm gonna try the full data and see how it goes

TobyBaril commented 2 years ago

Great!

Could you give me a little more information about your system? I'm going to try and work out why this might be happening, as it has been tested on many other systems without this issue.

Could you also attach a copy of the .yml file that is currently in your local EarlGrey directory? And also a copy of the yml for your EarlGrey environment, which you can make by running conda env export > environment.yml when your environment is active.

J-Calvelo commented 2 years ago

Here, though I also updated conda trying to fix the instalation so I might have covered up the cause. Current_conda_environment.txt Original_earlGrey.txt

J-Calvelo commented 2 years ago

Well, the pipeline is still running but I'm getting several of this errors:

Traceback (most recent call last):
  File "/home/amanda/programas/EarlGrey/scripts/extract_align.py", line 204, in <module>
    if __name__ =="__main__":main()
  File "/home/amanda/programas/EarlGrey/scripts/extract_align.py", line 191, in main
    FILES = [F for F in os.listdir('muscle/') if F.endswith('_cons.fa')]
FileNotFoundError: [Errno 2] No such file or directory: 'muscle/

If its the aligner muscle it is indeed installed and on the path

TobyBaril commented 2 years ago

Thanks for the files, I will try to have a look through these when I have some spare time.

Don't worry about the error in extract_align.py ; the pipeline only makes use of the first step of this script, but it will still attempt to delete the files made if the aligner of this script is used. The alignment step is completed outside of extract_align by mafft, I just didn't get around to updating the script to place the file delete section within a condition as this was originally written by another group.

UPDATE:

I have hashed out the lines that attempt to remove the muscle/ directory which is never created as part of this script.

J-Calvelo commented 2 years ago

Nice! This time it finished and re summary results seem ok. But there are two other errors in the log:

Just before "Identifying Repeats Using Species-Specific Library"

awk: fatal: cannot open file /mnt/Disco1/amanda_PROYECTs/Project_Bsudanica/Bsudanica_Runs/Final_Genome_Annotation/BsudanicaEarlGrey/Bsudanica_Masked_de_novo_Repeats/Bsudanica_de_novo_repeat_library_iter*.fasta.out' for reading (No such file or directory)

And just before "Resolving Overlapping Repeats"

Traceback (most recent call last):
  File "/home/amanda/programas/EarlGrey/scripts/repeatCraft/repeatcraft.py", line 187, in <module>
    rcStatm.rcstat(rclabelp=outputnamelabel,rmergep=outputnamemerge,outfile= statfname, ltrgroup = True)
  File "/home/amanda/programas/EarlGrey/scripts/repeatCraft/helper/rcStatm.py", line 54, in rcstat
    if rowRaw.get(col[2]):
IndexError: list index out of range

Are they too superflous? Thanks

TobyBaril commented 2 years ago

I have edited a line in the main file to fix the first error.

The second one is slightly dependent on which files were created, as the RepeatCraft tool sometimes throws a weird error after it has successfully finished running.

Could you upload the log file and I can double check?

J-Calvelo commented 2 years ago

Here BsudanicaEarlGrey.log.gz

TobyBaril commented 2 years ago

Great! Yes, the second repeatcraft error is a strange one that I have flagged on their tool's github, however it currently makes no difference to the output results so can be ignored.

J-Calvelo commented 2 years ago

Nice! Thanks

TobyBaril commented 2 years ago

Strange that those packages did not install as part of the conda environment, as your environment file contained all the correct recipes...I'll try and look into this further, but glad you have found a workaround for the moment!

TobyBaril commented 2 years ago

Closing for the moment, as currently unable to reproduce the compatibility issue

zhangwenda0518 commented 2 years ago

same errors

Step 6: Writing stat file..Removing tmp files... Done Traceback (most recent call last): File "/EarlGrey/scripts/repeatCraft/repeatcraft.py", line 187 , in rcStatm.rcstat(rclabelp=outputnamelabel,rmergep=outputnamemerge,outfile= statfname, ltrgroup = Tru e) File "*/EarlGrey/scripts/repeatCraft/helper/rcStatm.py", line 54, in rcstat if rowRaw.get(col[2]): IndexError: list index out of range

than errors < Resolving Overlapping Repeats > \ ^^ \ (oo)_____ (__)\ )\/ ----w

Rscript execution error: No such file or directory Rscript execution error: No such file or directory cp: cannot stat 'EarlGrey/mergedRepeats/looseMerge/*.filteredRepeats.bed'

then the resoult in *summaryFiles only the combined_library.fasta (17Mb)and de_novo_repeat_library_iter4.fasta.clustered.fa(0kb)**

any other informations need ?Can you give me some suggestions to solve it? Thanks!

TobyBaril commented 2 years ago

could you upload a copy of the log file from the run? It should be found inside the EarlGrey outputs directory and named with the extension ".log". The python error is not a problem, just something that seems to happen with repeatcraft for some reason!

Could you also provide the output of ls -lth * inside the Earl Grey outputs directory, so I can try and work out where it has gone wrong?

thanks!

zhangwenda0518 commented 2 years ago

the size directory

size.log

the running log xpEarlGrey.log

TobyBaril commented 2 years ago

It looks like the Earl Grey script is struggling to find the script directory of the github repo. I would suggest reconfiguring earlGrey and making sure it's up the latest version:

In the EarlGrey directory (where the Docker, scripts, modules directories are etc):

git stash
git pull
chmod +x configure
./configure

Then rerun the test with a new speciesname with the -s flag, and see if it completes a full run. One way to check that Earl Grey has configured properly is to open the script and check that SCRIPTDIR= is pointing to the scripts directory of the EarlGrey directory that is created from cloning the repository.

zhangwenda0518 commented 2 years ago

It looks like the Earl Grey script is struggling to find the script directory of the github repo. I would suggest reconfiguring earlGrey and making sure it's up the latest version:

In the EarlGrey directory (where the Docker, scripts, modules directories are etc):

git stash
git pull
chmod +x configure
./configure

Then rerun the test with a new speciesname with the -s flag, and see if it completes a full run. One way to check that Earl Grey has configured properly is to open the script and check that SCRIPTDIR= is pointing to the scripts directory of the EarlGrey directory that is created from cloning the repository.

Thank you and your suggestion, I have successfully run it now.