AlexanderLabWHOI / EUKulele

Automatic eukaryotic taxonomic classification
MIT License
28 stars 7 forks source link

BUSCO error #14

Open shu251 opened 4 years ago

shu251 commented 4 years ago

EUKulele stopped at the BUSCO step for me Error message: BUSCO run either did not complete successfully, or returned no matches for sample

I checked the log files and these are the outputs from busco_run.out:

python3 busco_configurator.py /vortexfs1/home/sarahhu/.conda/envs/EUKulele/bin/../config/config.ini /vortexfs1/omics/alexander/shu/eukulele-test/mmetsp-diamond/nb-output-13-09/busco/config_merged_merged.ini
No module named 'Bio'
There was a problem installing BUSCO or importing one of its dependencies. See the user guide and the GitLab issue board (https://gitlab.com/ezlab/busco/issues) if you need further assistance.

Looks like an issue with the busco install, as wherever the conda environment is looking for the busco config files, they are missing. For the latest conda install from EUKulele, I installed busco 4.0.6 and downgraded Biopython to 1.77 (as per internet suggestions).

shu251 commented 4 years ago

The change to BUSCO 4.0.6 and Biopython to 1.77 resolved EUKulele with BUSCO, but I was still having issues that were related to conda. To downgrade BUSCO conda install busco=4.0.6 --force-reinstall and biopython pip install biopython==1.77

This mainly had to do with an incompatibility between BUSCO and conda on the HPC I was using. When slurm submitted jobs it was running conda -V 4.7.10. I updated conda to run 4.8.5, by installing to my home on the HPC. While this meant I could run EUKulele in my environment that supported BUSCO now, slurm was not re-directing to the newly install conda. My error message when submitting slurm jobs:

CommandNotFoundError: Your shell has not been properly configured to use
'conda activate'.

And the output from running conda -V with slurm was the old conda version.

To fix this, I included a new line in my slurm script that redirects conda to where my newly installed version is located. Resource for this . $CONDA_ROOT~/anaconda3/etc/profile.d/conda.sh where you include the whole path to the location of conda.sh following . $CONDA_ROOT With this ahead of conda activate EUKulele, slurm how uses the correct version of conda.

MichielPerneel commented 3 years ago

I had a similar problem when running EUKulele on my HPC with BUSCO 4.1.2. My overall output mentions an error:

Running EUKulele with command line arguments, as no valid configuration file was provided. Setting things up... ['final_contigs'] Specified reference directory, reference FASTA, and protein map/taxonomy table not found. Using database in location: /data/gent/vo/001/gvo00125/vsc43619/references/mmetsp/marmmetsp. Found database folder for /data/gent/vo/001/gvo00125/vsc43619/references/mmetsp/marmmetsp in current directory; will not re-download. Creating a diamond reference from database files... Diamond database file already created; will not re-create database. Aligning to reference database... Aligning sample final_contigs... Diamond process exited for sample final_contigs. Performing taxonomic estimation steps... Performing taxonomic visualization steps... Performing BUSCO steps... Configuring BUSCO... Running busco with 1 simultaneous jobs... [] is what is in BUSCO directory BUSCO run either did not complete successfully, or returned no matches for sample final_contigs . Check busco_run log for details. No BUSCO matches found for any sample. Check BUSCO run log for details. Exiting... EUKulele run complete!

This is the output of the BUSCO run log:

Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information. There was a problem installing BUSCO or importing one of its dependencies. See the user guide and the GitLab issue board (https://gitlab.com/ezlab/busco/issues) if you need further assistance.

MichielPerneel commented 3 years ago

Also, is there a way to change the directory BUSCO operates on? It creates a folder busco_downloads on my HPC login node, which is limited in size.

akrinos commented 3 years ago

Hi @MichielPerneel ! Are you using the conda install of EUKulele? If you're not using that install, could you check your biopython version? Because of that swap within biopython, there are a couple of permutations of versions of biopython plus BUSCO that work, but not all.

That is a great suggestion on the location of BUSCO downloads being specified - the reason it is the way it is is to allow for multiple runs of EUKulele using the same reference DB, similar to the main reference DB. But you're right that we should give users the option to change that location and not just use the default, which I will certainly add in the future. Apologies if it poses a problem to you at the moment!

boegel commented 3 years ago

@akrinos I provided the EUKulele installation that @MichielPerneel is using, the installation was performed using EasyBuild.

For more details about versions of dependencies, see https://github.com/easybuilders/easybuild-easyconfigs/pull/12152/files#diff-5838fa6d606dccd5dfbe3e29623efe74041f4662e9774ec40bb8bb1ad6343e5d. I used Biopython 1.78 and BUSCO 4.1.2, is that supposed to work?

If not:

susheelbhanu commented 3 years ago

@akrinos I'm having the same issue as @MichielPerneel on a HPC. I installed it via conda and everything seems to be working fine except for busco

I have the following versions of busco and biopython

biopython                 1.78             py39h3811e60_1    conda-forge
busco                     4.1.2                 py37r40_0    bioconda

Based on @shu251's comment, should we downgrade both 'cos my current conda version is as follows:

[sbusi@access1]$ conda --version
conda 4.9.2

Thank you for your help with this!

akrinos commented 3 years ago

Apologies for getting back to this so late, @boegel and @MichielPerneel ! I missed the mention.

Was EUKulele installed via conda? If so, I currently have bioconda 1.77 in the recipe. I have always used this version of biopython myself - I need to work out additional tests with 1.78, although I was under the impression that the Alphabet issue was no longer a problem with the later versions of BUSCO. The current conda build for EUKulele also uses BUSCO 4.1.4. If you end up with those versions of biopython and BUSCO after the conda install @susheelbhanu, I will need to investigate and find out why, but to potentially work things out faster, the first thing I would try is downgrading to 1.77 biopython.

In the future, I certainly need to make sure that the specified versions of all the dependencies are what end up installed with EUKulele, and to update the documentation with acceptable version combinations for users using pip. So far, we have recommended biopython 1.77 and BUSCO 4.0.6 or 4.1.4.

susheelbhanu commented 3 years ago

Thank you, @akrinos! I will try it out with biopython=1.77 and busco=4.1.4 to see what happens.

susheelbhanu commented 3 years ago

@akrinos The problem persists with biopython=1.77 and busco=4.1.4. The following is the error message I get:

There was a problem installing BUSCO or importing one of its dependencies. See the user guide and the GitLab issue board (https://gitlab.com/ezlab/busco/issues) if you need further assistance.

Would you happen to have a yaml file that is working at your end, which I can use to build the environment?

Thank you!

akrinos commented 3 years ago

Hi @susheelbhanu - thanks so much for hanging in there with this! I just did a conda install of EUKulele, and am testing that now. I'm not sure whether we established the workflow you used to install EUKulele? I would try BUSCO 4.0.6 next - 4.1.4 was working for me, but EUKulele was built on 4.0.6, so that's likely the best bet. You can use this yaml:

name: EUKulele
channels:
    - bioconda
    - conda-forge
    - defaults
    - anaconda
dependencies:
    - blast
    - biopython=1.77
    - busco=4.0.6
    - diamond
    - transdecoder
    - ujson
    - pandas
    - yaml=0.1.7
    - chardet
    - pyyaml=5.1.2
    - numpy
    - joblib
    - pandas

I can also give you an exported env that I have been using, but in general those have a lot of extraneous libraries. Thanks again for working through this problem with us!

susheelbhanu commented 3 years ago

Thanks a lot @akrinos. I'll give this a go, and if still have persisting issues come back for that env albeit with the extraneous libraries. I already have some data that makes sense but want to see how busco affects the current reports.

Will keep you posted!

UPDATE: @akrinos The environment you provided works, and BUSCO has no issues being installed within the dependencies. I did however, run into a similar issue as MichielPerneel. Please see below:

Performing taxonomic estimation steps...
Performing taxonomic visualization steps...
Performing BUSCO steps...
Configuring BUSCO...
BUSCO lineage database already found; not re-downloaded.
Running busco with 1 simultaneous jobs...
['short_summary.specific.eukaryota_odb10.GL_R68_GL53_UP_2.txt', 'logs', 'run_eukaryota_odb10'] is what is in BUSCO directory
At least one BUSCO present in sample GL_R68_GL53_UP_2 but 16 missing.
BUSCO query did not run successfully for sample GL_R68_GL53_UP_2; check log file for details.