bioconda / bioconda-recipes

Conda recipes for the bioconda channel.
https://bioconda.github.io
MIT License
1.61k stars 3.22k forks source link

RepeatMasker 4.1.6 update #45328

Open KristinaGagalova opened 8 months ago

KristinaGagalova commented 8 months ago

Hi,

Since there are significant updates in the new version, could you please create the package for RepeatMasker 4.1.6?

Thank you in advance

TobyBaril commented 8 months ago

This is not as simple as many will think, due to version 4.1.6 requiring at least a several GB database to be configured, which cannot be stored and shared in github/conda packages which are designed for sharing code only.

I have submitted a potential workaround for this: #45513 , but will need to see if this works with conda rules.

KristinaGagalova commented 8 months ago

Hi @TobyBaril I have a suggestion. Instead of downloading the database and configuring it, why don't we try to change manually the configuration file for a placeholder? The user would then pass the -libdir parameter later. This is how I bypassed the db download for my docker container

https://github.com/KristinaGagalova/pante2/blob/ae44d6c5955f3be42e91811c4610e070eab5605e/containers/pante2_env.Dockerfile#L109-L132

TobyBaril commented 8 months ago

Yes great idea! This worked for version 4.1.5 but the Perl configure script for 4.1.6 requires the actual file to correctly configure. I tried with a placeholder and it just exits and asks me to download the correct file even with the same name etc…it has a minimum requirement of the 0 partition of the database for correct configuration.

TobyBaril commented 8 months ago

I'm going to try and see if it will configure properly with a single entry from the small database, this seems like it is worth trying...

ptrebert commented 6 months ago

Just to chime in here: if at all possible, it would really --- and I mean really, really --- be great if there was a way to make the RepeatMasker installation via Conda more straightforward on offline systems such as HPC infrastructure.

I am aware that certain Conda packages download annotations/DBs during --- or after --- installation from some other source (= not conda channels). This will always break / never work on offline systems. So far, the way around that for us was to use conda-pack / -unpack, which worked fine. But conda-pack does not work for RepeatMasker. Given the non-trivial dependency resolution and configuration (see, e.g. here #9988 ), I am not surprised that after unpacking, RepeatMasker just quits with a bad interpreter error, so some paths are likely not properly reconfigured by conda-unpack after migrating the environment.

KristinaGagalova commented 6 months ago

Since this looks so painful, I reiterate my previous message: we don't need to download databases; we can change the config file manually. That's the fastest way and will hopefully work out for the checkups. Please let me know if you want to give it another stub, or I can try to modify the configuration through scripting.

Thank you in advance

mikecuoco commented 3 months ago

Hi all, I just wanted to check-in on this issue. Has this been resolved?

KristinaGagalova commented 3 months ago

Hi, The issue has not been resolved yet, the PR is still open.

mikecuoco commented 3 months ago

@TobyBaril did you have any luck with this solution?

I'm going to try and see if it will configure properly with a single entry from the small database, this seems like it is worth trying...

TobyBaril commented 3 months ago

Hi @mikecuoco, no luck with configuring with a single entry. I've had a good chat with the authors and they acknowledge the issues with the database configuration in Dfam3.8 and the new RepeatMasker, so hopefully this will be altered in the next Dfam release.

RepeatMasker 4.1.6 with the root partition has been force-pushed to bioconda though, see: https://github.com/bioconda/bioconda-recipes/pull/45513

EDIT: I can also confirm that the recipe does indeed work, though likely is too big to be tested in Azure DevOps, as I have a functional version on my personal conda channel: https://anaconda.org/toby_baril_bio/repeatmasker

mikecuoco commented 3 months ago

@TobyBaril I noticed that force push, that makes sense now. Perhaps we can force a merge to master despite the CI failures.

mikecuoco commented 3 months ago

@TobyBaril have you tested your personal conda version on OSX? I noticed @abretaud found an issue with the build process there (see #43288 and #43994) - perhaps we could incorporate that in your PR as well

TobyBaril commented 3 months ago

I haven't tested on OSX - I currently don't have a system with enough storage. I guess it is just a case of swapping a couple of bits to make it behave though?

mikecuoco commented 3 months ago

Ok I got the following error, which was replicated when trying repeatmasker 4.1.5 from bioconda. Looks like hmmer isn't available for osx. Perhaps we can remove osx as a usable platform for now? Not sure how many folks are using repeatmasker outside linux HPCs anyway- maybe in the bacterial community...

toby_baril_bio/noarch (check zst)                   Checked  0.2s
toby_baril_bio/osx-arm64 (check zst)                Checked  0.3s
toby_baril_bio/noarch                              512.0 B @   5.3kB/s  0.1s
toby_baril_bio/osx-arm64                           125.0 B @ 973.0 B/s  0.1s
nodefaults/osx-arm64                                          No change
bioconda/osx-arm64                                            No change
nodefaults/noarch                                             No change
bioconda/noarch                                      5.3MB @   1.5MB/s  3.4s
conda-forge/osx-arm64                               10.3MB @   2.0MB/s  5.1s
conda-forge/noarch                                  15.0MB @   2.2MB/s  6.8s
error    libmamba Could not solve for environment specs
    The following package could not be installed
    └─ repeatmasker 4.1.6  is not installable because it requires
       └─ hmmer, which does not exist (perhaps a missing channel).
critical libmamba Could not solve for environment specs

UPDATE: Sorry for the confusion- it actually appears bioconda does not yet support osx-arm64 architectures, which I have on my laptop. Perhaps we should just move forward with @abretaud's easy zcat solution then @TobyBaril ?