Closed J-Calvelo closed 2 years ago
Hi,
The first issue Can't locate CrossmatchSearchEngine.pm in @INC
is due to the conda installation of RepeatMasker, as this should be in the RepeatMasker
folder of the RepeatMasker installation. In the documentation on the Earl Grey README, we recommend against installing RepeatModeler and RepeatMasker using Conda, as there are known dependency issues with the packages. I would highly recommend installing RepeatMasker using the guidance on the RepeatMasker website, or using one of the methods from the installation page here.
If you would like to keep the conda installation of RepeatMasker, check the environment share folder (something like /home/amanda/miniconda3/envs/earlGrey/share/RepeatMasker/
) and see if this has CrossmatchSearchEngine.pm
present. If it is present, copy this file to /home/amanda/miniconda3/envs/earlGrey/bin
(cp CrossmatchSearchEngine.pm /home/amanda/miniconda3/envs/earlGrey/bin
) and now the file should be found by perl as this directory is in @INC. There are other ways to adjust @INC, but it is quite hard to mess with this inside the conda environment.
For the second issue, try cloning the EarlGrey repository again, and running the ./configure script, as GenomicRanges
should be installed as part of configuring the environment. You can check the earlGrey.yml
in your download is up to date as it should include a line for GenomicRanges: bioconductor-genomicranges=1.46.1=r41h5c21468_0
. To reconfigure, you will need to first remove the earlGrey environment before running the ./configure
script of Earl Grey. You can do this by: conda env remove -n earlGrey
.
I hope this helps!
Hello, I tired with a new installation of RepeatMasker and Modeller but I keep running with issues. For some reasson not all packages included in the earlGrey.yml are being installed. So far I can confirm GenomicRanges, r-ape and Emboss
I've managed to run a test dataset by installing _libgcc_mutex, _openmp_mutex, argon2-cffi, attrs, atk-1.0, bleach, biopython, blosc, bedtools, bwidget one by one, in addition to GenomicRanges, r-ape and Emboss. I'm gonna try the full data and see how it goes
Great!
Could you give me a little more information about your system? I'm going to try and work out why this might be happening, as it has been tested on many other systems without this issue.
Could you also attach a copy of the .yml file that is currently in your local EarlGrey directory? And also a copy of the yml for your EarlGrey environment, which you can make by running conda env export > environment.yml
when your environment is active.
Here, though I also updated conda trying to fix the instalation so I might have covered up the cause. Current_conda_environment.txt Original_earlGrey.txt
Well, the pipeline is still running but I'm getting several of this errors:
Traceback (most recent call last):
File "/home/amanda/programas/EarlGrey/scripts/extract_align.py", line 204, in <module>
if __name__ =="__main__":main()
File "/home/amanda/programas/EarlGrey/scripts/extract_align.py", line 191, in main
FILES = [F for F in os.listdir('muscle/') if F.endswith('_cons.fa')]
FileNotFoundError: [Errno 2] No such file or directory: 'muscle/
If its the aligner muscle it is indeed installed and on the path
Thanks for the files, I will try to have a look through these when I have some spare time.
Don't worry about the error in extract_align.py ; the pipeline only makes use of the first step of this script, but it will still attempt to delete the files made if the aligner of this script is used. The alignment step is completed outside of extract_align by mafft, I just didn't get around to updating the script to place the file delete section within a condition as this was originally written by another group.
UPDATE:
I have hashed out the lines that attempt to remove the muscle/ directory which is never created as part of this script.
Nice! This time it finished and re summary results seem ok. But there are two other errors in the log:
Just before "Identifying Repeats Using Species-Specific Library"
awk: fatal: cannot open file /mnt/Disco1/amanda_PROYECTs/Project_Bsudanica/Bsudanica_Runs/Final_Genome_Annotation/BsudanicaEarlGrey/Bsudanica_Masked_de_novo_Repeats/Bsudanica_de_novo_repeat_library_iter*.fasta.out' for reading (No such file or directory)
And just before "Resolving Overlapping Repeats"
Traceback (most recent call last):
File "/home/amanda/programas/EarlGrey/scripts/repeatCraft/repeatcraft.py", line 187, in <module>
rcStatm.rcstat(rclabelp=outputnamelabel,rmergep=outputnamemerge,outfile= statfname, ltrgroup = True)
File "/home/amanda/programas/EarlGrey/scripts/repeatCraft/helper/rcStatm.py", line 54, in rcstat
if rowRaw.get(col[2]):
IndexError: list index out of range
Are they too superflous? Thanks
I have edited a line in the main file to fix the first error.
The second one is slightly dependent on which files were created, as the RepeatCraft tool sometimes throws a weird error after it has successfully finished running.
Could you upload the log file and I can double check?
Great! Yes, the second repeatcraft error is a strange one that I have flagged on their tool's github, however it currently makes no difference to the output results so can be ignored.
Nice! Thanks
Strange that those packages did not install as part of the conda environment, as your environment file contained all the correct recipes...I'll try and look into this further, but glad you have found a workaround for the moment!
Closing for the moment, as currently unable to reproduce the compatibility issue
same errors
Step 6: Writing stat file..Removing tmp files... Done Traceback (most recent call last): File "/EarlGrey/scripts/repeatCraft/repeatcraft.py", line 187 , in rcStatm.rcstat(rclabelp=outputnamelabel,rmergep=outputnamemerge,outfile= statfname, ltrgroup = Tru e) File "*/EarlGrey/scripts/repeatCraft/helper/rcStatm.py", line 54, in rcstat if rowRaw.get(col[2]): IndexError: list index out of range
than errors < Resolving Overlapping Repeats > \ ^^ \ (oo)_____ (__)\ )\/ | ----w |
---|
Rscript execution error: No such file or directory Rscript execution error: No such file or directory cp: cannot stat 'EarlGrey/mergedRepeats/looseMerge/*.filteredRepeats.bed'
then the resoult in *summaryFiles only the combined_library.fasta (17Mb)and de_novo_repeat_library_iter4.fasta.clustered.fa(0kb)**
any other informations need ?Can you give me some suggestions to solve it? Thanks!
could you upload a copy of the log file from the run? It should be found inside the EarlGrey outputs directory and named with the extension ".log". The python error is not a problem, just something that seems to happen with repeatcraft for some reason!
Could you also provide the output of ls -lth *
inside the Earl Grey outputs directory, so I can try and work out where it has gone wrong?
thanks!
It looks like the Earl Grey script is struggling to find the script directory of the github repo. I would suggest reconfiguring earlGrey and making sure it's up the latest version:
In the EarlGrey directory (where the Docker, scripts, modules directories are etc):
git stash
git pull
chmod +x configure
./configure
Then rerun the test with a new speciesname with the -s
flag, and see if it completes a full run. One way to check that Earl Grey has configured properly is to open the script and check that SCRIPTDIR=
is pointing to the scripts directory of the EarlGrey directory that is created from cloning the repository.
It looks like the Earl Grey script is struggling to find the script directory of the github repo. I would suggest reconfiguring earlGrey and making sure it's up the latest version:
In the EarlGrey directory (where the Docker, scripts, modules directories are etc):
git stash git pull chmod +x configure ./configure
Then rerun the test with a new speciesname with the
-s
flag, and see if it completes a full run. One way to check that Earl Grey has configured properly is to open the script and check thatSCRIPTDIR=
is pointing to the scripts directory of the EarlGrey directory that is created from cloning the repository.
Thank you and your suggestion, I have successfully run it now.
Hello, I'm running into what I think is an instalation and/or compatibility issue. The pipeline continues up to the generating the *_mergedRepeats stage but fails to geneate the GFF files with the message
Can't locate CrossmatchSearchEngine.pm in @INC (you may need to install the CrossmatchSearchEngine module) (@INC contains: /home/amanda/miniconda3/envs/earlGrey/bin/../ /home/amanda/miniconda3/envs/earlGrey/bin /home/amanda/miniconda3/envs/earlGrey/lib/site_perl/5.26.2/x86_64-linux-thread-multi /home/amanda/miniconda3/envs/earlGrey/lib/site_perl/5.26.2 /home/amanda/miniconda3/envs/earlGrey/lib/5.26.2/x86_64-linux-thread-multi /home/amanda/miniconda3/envs/earlGrey/lib/5.26.2 .) at /home/amanda/miniconda3/envs/earlGrey/bin/rmOutToGFF3.pl line 76.
Latter on the pipeline continues regardles up to the "Resolving Overlaping Repeats" where it also reports that:
I'm ataching the last 300 lines of the log file EarlGrey.log
I'm not sure how to solve it. I'm using RepeatMasker v4.1.2-p1 and RepeatModeler v2.0.3, both installed with conda as part of the environment earlGrey. And were installed before running the configure file. Thanks