Bios4Biol / intronSeeker

IntronSeeker identifies potentially retained introns in de novo RNA-seq assembly in order to quantify and remove them.
GNU General Public License v3.0
1 stars 0 forks source link

[JOSS REVIEW] Various comments #2

Open CFGrote opened 5 months ago

CFGrote commented 5 months ago

JOSS Review Issue

During reading, installation and testing, I came across the following items that would be nice to see fixed but not at top priority.

Installation:

The authors did a great job in wrapping most of the common installation steps in the setup.sh script. However, I'd recommend following community best-practises in providing an environment.yml file in the project root and instructions how to consume it by conda install, respectively. Leave the details of how to create an environment, using mamba or conda, override channels, etc. to the user. E.g. running

./setup.sh

failled for me, likely because my conda setup is somewhat non-standard. This may be true for other users too.

Python version:

python3.6 has reached end-of-life, I'd suggest bumping the minimum version to 3.8 (see https://devguide.python.org/versions/)

pbordron commented 5 months ago

Installation

We thank you for this helpful advice. We will rework the installation method to match the best practices. We will create a conda channel that allows to install the dependencies that were out of channels conda-forge and bioconda and that were managed by the setup.sh script until now. This way IntronSeeker will be easily installed by consuming the environment.yml file.

Python version

We will check if using an up to date python version don't break IntronSeeker and update it if possible.


@CFGrote We'll get back to you as soon as it's done.

CFGrote commented 4 months ago

any progress on this one? Just to make sure I did not miss any notification or updates. thanks!

Bios4Biol commented 4 months ago

any progress on this one? Just to make sure I did not miss any notification or updates. thanks!

Thank you for your message, we're still working on it.

CFGrote commented 2 weeks ago

INSTALL.md says:

all these dependencies and their versions are detailed in the file requirements.txt

however, there is no file in the repo by this name.

CFGrote commented 2 weeks ago

I followed the new installation instructions but the sthe command

▶ mamba env create -f config/environment.yml                             
Retrieving notices: ...working... done
intronseeker/linux-64                              979.0 B @   2.0kB/s  0.5s
intronseeker/noarch                                643.0 B @ 874.0 B/s  0.3s
bioconda/linux-64                                    5.6MB @   4.5MB/s  1.5s
pkgs/r/linux-64                                      1.9MB @   1.2MB/s  0.8s
bioconda/noarch                                      5.2MB @   3.2MB/s  1.9s
pkgs/main/noarch                                   875.0kB @ 387.1kB/s  0.6s
pkgs/r/noarch                                        2.3MB @ 946.6kB/s  1.3s
conda-forge/noarch                                  18.9MB @   5.2MB/s  4.5s
pkgs/main/linux-64                                   7.3MB @   1.6MB/s  3.4s
conda-forge/linux-64                                44.2MB @   4.2MB/s 13.1s

Looking for: ['intronseeker::assemblathon_stats', "bcftools[version='>=1.9']", "biopython[version='>=1.74']", 'diamond==0.9.24', "emboss[version='>=6.6.0']", 'gffread==0.11.4', "gtfparse[version='>=1.2.0']", 'intronseeker::grinder==0.5.4', 'hisat2==2.1.0', "numpy[version='>=1.16.4']", "pandas[version='>=0.25.0']", "perl-test-more[version='>=1.001002']", "perl-test-warn[version='>=0.36']", "perl-bioperl-core[version='>=1.007002']", "perl-list-moreutils[version='>=0.428']", "perl-math-random[version='>=0.72']", "plotly[version='>=4.0.0']", "pysam[version='>=0.15.3']", "python[version='>=3.6.9']", 'samtools==1.9', 'star==2.7.1a', "scipy[version='>=1.4.1']", 'transdecoder==5.5.0']

never returns. I aborted after >30 minutes. Is this the normal behaviour?

CFGrote commented 1 week ago

Tried again in a two-step approach: 1) create the environment

conda create -n ISeeker_environment 'python>=3.6' -c bioconda -c conda-forge
conda activate ISeeker_environment

2) Update with

conda env update -n ISeeker_environment -f config/enviroment.yml

and that runs through in reasonable time.

pbordron commented 1 week ago

Thank you for pointing out the long installation time.

Observation

We find out that older conda versions are slow when installing intronSeeker envionment. An updated version of conda does not have this issue.

For example, conda==24.7.1 with conda-libmamba-solver==24.7.0 resolves dependencies in less than 2 minutes, when conda==23.3.1 with conda-libmamba-solver==23.3.0 resolves the same depencies in more than 40 minutes.

Solution

We will recommend to update conda base environment to lastest version in the documentation. In addition, we will create container image that can be used with apptainer/singularity and docker.

Why it is so long

IntronSeeker uses old versions of some softwares. We decided to stick to those versions for the sake of reproducibility of this publication.

In details, resolving dependencies is slow because it must be used with channel_priority=flexible option. If we use the channel_priority=strict option recommended by bioconda setup, the resolution of dependencies fails: at least one dependencies, satisfying the environment constrains, only exist in bioconda, but is ignored by solver because the same dependency exists in conda-forge, but without satisfying the constrains (See conda documentation for details)

Some dependencies used by intronSeeker were published before the option channel_priority=strict was introduced into conda, when bioconda duplicated many libraries from conda-forge. Some were deindexed from conda-forge since but not from bioconda.

Appendices

We use this command to measure how much time conda used to resolve intronSeeker dependencies

time conda env create --dry-run --solver libmamba -f config/environment.yml