Open KristinaGagalova opened 8 months ago
This is not as simple as many will think, due to version 4.1.6 requiring at least a several GB database to be configured, which cannot be stored and shared in github/conda packages which are designed for sharing code only.
I have submitted a potential workaround for this: #45513 , but will need to see if this works with conda rules.
Hi @TobyBaril
I have a suggestion.
Instead of downloading the database and configuring it, why don't we try to change manually the configuration file for a placeholder? The user would then pass the -libdir
parameter later. This is how I bypassed the db download for my docker container
Yes great idea! This worked for version 4.1.5 but the Perl configure script for 4.1.6 requires the actual file to correctly configure. I tried with a placeholder and it just exits and asks me to download the correct file even with the same name etc…it has a minimum requirement of the 0 partition of the database for correct configuration.
I'm going to try and see if it will configure properly with a single entry from the small database, this seems like it is worth trying...
Just to chime in here: if at all possible, it would really --- and I mean really, really --- be great if there was a way to make the RepeatMasker installation via Conda more straightforward on offline systems such as HPC infrastructure.
I am aware that certain Conda packages download annotations/DBs during --- or after --- installation from some other source (= not conda channels). This will always break / never work on offline systems. So far, the way around that for us was to use conda-pack / -unpack
, which worked fine. But conda-pack
does not work for RepeatMasker. Given the non-trivial dependency resolution and configuration (see, e.g. here #9988 ), I am not surprised that after unpacking, RepeatMasker just quits with a bad interpreter
error, so some paths are likely not properly reconfigured by conda-unpack
after migrating the environment.
Since this looks so painful, I reiterate my previous message: we don't need to download databases; we can change the config file manually. That's the fastest way and will hopefully work out for the checkups. Please let me know if you want to give it another stub, or I can try to modify the configuration through scripting.
Thank you in advance
Hi all, I just wanted to check-in on this issue. Has this been resolved?
Hi, The issue has not been resolved yet, the PR is still open.
@TobyBaril did you have any luck with this solution?
I'm going to try and see if it will configure properly with a single entry from the small database, this seems like it is worth trying...
Hi @mikecuoco, no luck with configuring with a single entry. I've had a good chat with the authors and they acknowledge the issues with the database configuration in Dfam3.8 and the new RepeatMasker, so hopefully this will be altered in the next Dfam release.
RepeatMasker 4.1.6 with the root partition has been force-pushed to bioconda though, see: https://github.com/bioconda/bioconda-recipes/pull/45513
EDIT: I can also confirm that the recipe does indeed work, though likely is too big to be tested in Azure DevOps, as I have a functional version on my personal conda channel: https://anaconda.org/toby_baril_bio/repeatmasker
@TobyBaril I noticed that force push, that makes sense now. Perhaps we can force a merge to master despite the CI failures.
@TobyBaril have you tested your personal conda version on OSX? I noticed @abretaud found an issue with the build process there (see #43288 and #43994) - perhaps we could incorporate that in your PR as well
I haven't tested on OSX - I currently don't have a system with enough storage. I guess it is just a case of swapping a couple of bits to make it behave though?
Ok I got the following error, which was replicated when trying repeatmasker 4.1.5 from bioconda. Looks like hmmer isn't available for osx. Perhaps we can remove osx as a usable platform for now? Not sure how many folks are using repeatmasker outside linux HPCs anyway- maybe in the bacterial community...
toby_baril_bio/noarch (check zst) Checked 0.2s
toby_baril_bio/osx-arm64 (check zst) Checked 0.3s
toby_baril_bio/noarch 512.0 B @ 5.3kB/s 0.1s
toby_baril_bio/osx-arm64 125.0 B @ 973.0 B/s 0.1s
nodefaults/osx-arm64 No change
bioconda/osx-arm64 No change
nodefaults/noarch No change
bioconda/noarch 5.3MB @ 1.5MB/s 3.4s
conda-forge/osx-arm64 10.3MB @ 2.0MB/s 5.1s
conda-forge/noarch 15.0MB @ 2.2MB/s 6.8s
error libmamba Could not solve for environment specs
The following package could not be installed
└─ repeatmasker 4.1.6 is not installable because it requires
└─ hmmer, which does not exist (perhaps a missing channel).
critical libmamba Could not solve for environment specs
UPDATE: Sorry for the confusion- it actually appears bioconda does not yet support osx-arm64
architectures, which I have on my laptop. Perhaps we should just move forward with @abretaud's easy zcat
solution then @TobyBaril ?
Hi,
Since there are significant updates in the new version, could you please create the package for RepeatMasker 4.1.6?
Thank you in advance