ModuleNotFoundError at TEstrainer stage

TobyBaril / EarlGrey

Earl Grey: A fully automated TE curation and annotation pipeline

Other

139 stars 20 forks source link

ModuleNotFoundError at TEstrainer stage #120

Closed kiran-lee closed 3 months ago

kiran-lee commented 4 months ago

Thank you hugely for your resource.

Do you have suggestions for solving this error message (full log attached as earlgrey.log)? <<< Straining TEs and Refining de novo Consensus Sequences >>> Traceback (most recent call last): File "/home/bop21kgl/.conda/envs/earlgrey/share/earlgrey-4.2.4-1/scripts//TEstrainer/scripts//splitter.py", line 8, in from Bio import SeqIO ModuleNotFoundError: No module named 'Bio' Traceback (most recent call last): File "/home/bop21kgl/.conda/envs/earlgrey/share/earlgrey-4.2.4-1/scripts//TEstrainer/scripts//indexer.py", line 4, in from pyfaidx import Faidx ModuleNotFoundError: No module named 'pyfaidx'

Bio (biopython) and faidx are both present in the earlgrey conda environment (condaearlgreymodules.txt).

In most circumstances, I would troubleshoot by trial and error of, for example, trying other installation methods but the run took 5 days and 21 hours before failing at the TEstrainer stage and so was hoping to try your experience first!

Kiran

TobyBaril commented 4 months ago

Hi,

this can sometimes happen if you have multiple conda instances (e.g miniconda and micromamba), or if you have a mix of packages installed with pip and conda.

Can you successfully load biopython if you activate the earlgrey environment, then run python, then try import Bio. I would also try the same with from pyfaidx import Faidx. This might help to narrow down if the issue is with the conda environment or something else.

Regarding the runtime, you can recover from the last complete step, so won't need to run the first few steps again. In your case, if you delete everything inside [species]_strainer then the pipeline will skip the previous steps and run TEstrainer again.

SJ-Smit commented 4 months ago

I get the same error when trying to run earlgrey 3.2 with -l flag to fa.strained output from a related species. I specified the complete path to the strained file. If I run it without -l it runs fine and calls TEstrainer fine. I am trying to build a sequential library for a numbe rof closely related species.

TobyBaril commented 4 months ago

@SJ-Smit do you have a log file I could take a look at? This is really odd as TEstrainer has nothing to do with a supplied library, and this isn't supplied as an option to the script! It would be good to see if I can work out what is triggering this!

SJ-Smit commented 4 months ago

S.tenuifoliaEarlGrey.log

SJ-Smit commented 4 months ago

Hi Toby. I Think the problem I had was a combination of SLURM and conda env issues. Once I managed to get v4.2.4 installed via conda on our HPC things worked fine. I had trouble with getting v4 installed due to corrupted repeatmodeler pkg and conflicts with paths. Removing the repeatmodeler pkg and reinstalling earlgrey as per your instructions worked perfectly and its been running smoothly with no biopython issues

TobyBaril commented 3 months ago

Hi @SJ-Smit, that's great to hear! Sometimes these strange things happen with multiple installs or weird conda things that are hard to explain...I'm going to keep this issue in mind to see if there is anything that can stop this happening in the future

TobyBaril commented 3 months ago

Closing for the moment as non-reproducible - likely due to multiple installs or weird conda environment resolutions. If the issue persists, I recommend trying the Docker container. Feel free to reopen if more help is needed

kiran-lee commented 3 months ago

Hi all,

I solved the problem. Your discussion helped immensely- thank you for taking the time to troubleshoot! I am fairly new to bash, LINUX, and HPCs.

It was failing because I was calling source ~/.bashrc which had script to initialise conda, but was also calling conda in my submission script using module load Anaconda3/2019.07. When I hashtagged out module load Anaconda3/2019.07, it all worked perfect.