Closed kiran-lee closed 3 months ago
Hi,
this can sometimes happen if you have multiple conda instances (e.g miniconda and micromamba), or if you have a mix of packages installed with pip and conda.
Can you successfully load biopython if you activate the earlgrey environment, then run python
, then try import Bio
. I would also try the same with from pyfaidx import Faidx
. This might help to narrow down if the issue is with the conda environment or something else.
Regarding the runtime, you can recover from the last complete step, so won't need to run the first few steps again. In your case, if you delete everything inside [species]_strainer
then the pipeline will skip the previous steps and run TEstrainer again.
I get the same error when trying to run earlgrey 3.2 with -l flag to fa.strained output from a related species. I specified the complete path to the strained file. If I run it without -l it runs fine and calls TEstrainer fine. I am trying to build a sequential library for a numbe rof closely related species.
@SJ-Smit do you have a log file I could take a look at? This is really odd as TEstrainer has nothing to do with a supplied library, and this isn't supplied as an option to the script! It would be good to see if I can work out what is triggering this!
Hi Toby. I Think the problem I had was a combination of SLURM and conda env issues. Once I managed to get v4.2.4 installed via conda on our HPC things worked fine. I had trouble with getting v4 installed due to corrupted repeatmodeler pkg and conflicts with paths. Removing the repeatmodeler pkg and reinstalling earlgrey as per your instructions worked perfectly and its been running smoothly with no biopython issues
Hi @SJ-Smit, that's great to hear! Sometimes these strange things happen with multiple installs or weird conda things that are hard to explain...I'm going to keep this issue in mind to see if there is anything that can stop this happening in the future
Closing for the moment as non-reproducible - likely due to multiple installs or weird conda environment resolutions. If the issue persists, I recommend trying the Docker container. Feel free to reopen if more help is needed
Hi all,
I solved the problem. Your discussion helped immensely- thank you for taking the time to troubleshoot! I am fairly new to bash, LINUX, and HPCs.
It was failing because I was calling source ~/.bashrc which had script to initialise conda, but was also calling conda in my submission script using module load Anaconda3/2019.07. When I hashtagged out module load Anaconda3/2019.07, it all worked perfect.
Thank you hugely for your resource.
Do you have suggestions for solving this error message (full log attached as earlgrey.log)? <<< Straining TEs and Refining de novo Consensus Sequences >>> Traceback (most recent call last): File "/home/bop21kgl/.conda/envs/earlgrey/share/earlgrey-4.2.4-1/scripts//TEstrainer/scripts//splitter.py", line 8, in
from Bio import SeqIO
ModuleNotFoundError: No module named 'Bio'
Traceback (most recent call last):
File "/home/bop21kgl/.conda/envs/earlgrey/share/earlgrey-4.2.4-1/scripts//TEstrainer/scripts//indexer.py", line 4, in
from pyfaidx import Faidx
ModuleNotFoundError: No module named 'pyfaidx'
Bio (biopython) and faidx are both present in the earlgrey conda environment (condaearlgreymodules.txt).
In most circumstances, I would troubleshoot by trial and error of, for example, trying other installation methods but the run took 5 days and 21 hours before failing at the TEstrainer stage and so was hoping to try your experience first!
Kiran