Open shu251 opened 4 years ago
The change to BUSCO 4.0.6 and Biopython to 1.77 resolved EUKulele with BUSCO, but I was still having issues that were related to conda.
To downgrade BUSCO conda install busco=4.0.6 --force-reinstall
and biopython pip install biopython==1.77
This mainly had to do with an incompatibility between BUSCO and conda on the HPC I was using. When slurm submitted jobs it was running conda -V 4.7.10
. I updated conda to run 4.8.5, by installing to my home on the HPC. While this meant I could run EUKulele
in my environment that supported BUSCO now, slurm was not re-directing to the newly install conda.
My error message when submitting slurm jobs:
CommandNotFoundError: Your shell has not been properly configured to use
'conda activate'.
And the output from running conda -V
with slurm was the old conda version.
To fix this, I included a new line in my slurm script that redirects conda to where my newly installed version is located.
Resource for this
. $CONDA_ROOT~/anaconda3/etc/profile.d/conda.sh
where you include the whole path to the location of conda.sh
following . $CONDA_ROOT
With this ahead of conda activate EUKulele
, slurm how uses the correct version of conda.
I had a similar problem when running EUKulele on my HPC with BUSCO 4.1.2. My overall output mentions an error:
Running EUKulele with command line arguments, as no valid configuration file was provided. Setting things up... ['final_contigs'] Specified reference directory, reference FASTA, and protein map/taxonomy table not found. Using database in location: /data/gent/vo/001/gvo00125/vsc43619/references/mmetsp/marmmetsp. Found database folder for /data/gent/vo/001/gvo00125/vsc43619/references/mmetsp/marmmetsp in current directory; will not re-download. Creating a diamond reference from database files... Diamond database file already created; will not re-create database. Aligning to reference database... Aligning sample final_contigs... Diamond process exited for sample final_contigs. Performing taxonomic estimation steps... Performing taxonomic visualization steps... Performing BUSCO steps... Configuring BUSCO... Running busco with 1 simultaneous jobs... [] is what is in BUSCO directory BUSCO run either did not complete successfully, or returned no matches for sample final_contigs . Check busco_run log for details. No BUSCO matches found for any sample. Check BUSCO run log for details. Exiting... EUKulele run complete!
This is the output of the BUSCO run log:
Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the
molecule_type
as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information. There was a problem installing BUSCO or importing one of its dependencies. See the user guide and the GitLab issue board (https://gitlab.com/ezlab/busco/issues) if you need further assistance.
Also, is there a way to change the directory BUSCO operates on? It creates a folder busco_downloads on my HPC login node, which is limited in size.
Hi @MichielPerneel ! Are you using the conda
install of EUKulele
? If you're not using that install, could you check your biopython
version? Because of that swap within biopython
, there are a couple of permutations of versions of biopython
plus BUSCO
that work, but not all.
That is a great suggestion on the location of BUSCO
downloads being specified - the reason it is the way it is is to allow for multiple runs of EUKulele
using the same reference DB, similar to the main reference DB. But you're right that we should give users the option to change that location and not just use the default, which I will certainly add in the future. Apologies if it poses a problem to you at the moment!
@akrinos I provided the EUKulele
installation that @MichielPerneel is using, the installation was performed using EasyBuild.
For more details about versions of dependencies, see https://github.com/easybuilders/easybuild-easyconfigs/pull/12152/files#diff-5838fa6d606dccd5dfbe3e29623efe74041f4662e9774ec40bb8bb1ad6343e5d. I used Biopython 1.78 and BUSCO 4.1.2, is that supposed to work?
If not:
@akrinos I'm having the same issue as @MichielPerneel on a HPC. I installed it via conda
and everything seems to be working fine except for busco
I have the following versions of busco and biopython
biopython 1.78 py39h3811e60_1 conda-forge
busco 4.1.2 py37r40_0 bioconda
Based on @shu251's comment, should we downgrade both 'cos my current conda
version is as follows:
[sbusi@access1]$ conda --version
conda 4.9.2
Thank you for your help with this!
Apologies for getting back to this so late, @boegel and @MichielPerneel ! I missed the mention.
Was EUKulele
installed via conda
? If so, I currently have bioconda
1.77 in the recipe. I have always used this version of biopython
myself - I need to work out additional tests with 1.78, although I was under the impression that the Alphabet issue was no longer a problem with the later versions of BUSCO
. The current conda
build for EUKulele
also uses BUSCO
4.1.4. If you end up with those versions of biopython
and BUSCO
after the conda
install @susheelbhanu, I will need to investigate and find out why, but to potentially work things out faster, the first thing I would try is downgrading to 1.77 biopython
.
In the future, I certainly need to make sure that the specified versions of all the dependencies are what end up installed with EUKulele
, and to update the documentation with acceptable version combinations for users using pip
. So far, we have recommended biopython
1.77 and BUSCO
4.0.6 or 4.1.4.
Thank you, @akrinos! I will try it out with biopython=1.77 and busco=4.1.4
to see what happens.
@akrinos The problem persists with biopython=1.77 and busco=4.1.4
. The following is the error message I get:
There was a problem installing BUSCO or importing one of its dependencies. See the user guide and the GitLab issue board (https://gitlab.com/ezlab/busco/issues) if you need further assistance.
Would you happen to have a yaml
file that is working at your end, which I can use to build the environment?
Thank you!
Hi @susheelbhanu - thanks so much for hanging in there with this! I just did a conda
install of EUKulele
, and am testing that now. I'm not sure whether we established the workflow you used to install EUKulele
? I would try BUSCO 4.0.6 next - 4.1.4 was working for me, but EUKulele
was built on 4.0.6, so that's likely the best bet. You can use this yaml:
name: EUKulele
channels:
- bioconda
- conda-forge
- defaults
- anaconda
dependencies:
- blast
- biopython=1.77
- busco=4.0.6
- diamond
- transdecoder
- ujson
- pandas
- yaml=0.1.7
- chardet
- pyyaml=5.1.2
- numpy
- joblib
- pandas
I can also give you an exported env that I have been using, but in general those have a lot of extraneous libraries. Thanks again for working through this problem with us!
Thanks a lot @akrinos. I'll give this a go, and if still have persisting issues come back for that env
albeit with the extraneous libraries. I already have some data that makes sense but want to see how busco
affects the current reports.
Will keep you posted!
UPDATE:
@akrinos The environment you provided works, and BUSCO has no issues being installed within the dependencies
. I did however, run into a similar issue as MichielPerneel. Please see below:
Performing taxonomic estimation steps...
Performing taxonomic visualization steps...
Performing BUSCO steps...
Configuring BUSCO...
BUSCO lineage database already found; not re-downloaded.
Running busco with 1 simultaneous jobs...
['short_summary.specific.eukaryota_odb10.GL_R68_GL53_UP_2.txt', 'logs', 'run_eukaryota_odb10'] is what is in BUSCO directory
At least one BUSCO present in sample GL_R68_GL53_UP_2 but 16 missing.
BUSCO query did not run successfully for sample GL_R68_GL53_UP_2; check log file for details.
EUKulele stopped at the BUSCO step for me Error message:
BUSCO run either did not complete successfully, or returned no matches for sample
I checked the log files and these are the outputs from
busco_run.out
:Looks like an issue with the busco install, as wherever the conda environment is looking for the busco config files, they are missing. For the latest conda install from EUKulele, I installed busco 4.0.6 and downgraded Biopython to 1.77 (as per internet suggestions).