Ensembl / Bio-DB-HTS

Git repo for Bio::DB::HTS module on CPAN, providing Perl links into HTSlib
Apache License 2.0
24 stars 16 forks source link

INSTALL.pl - use master.zip instead of git clone #15

Closed keiranmraine closed 8 years ago

keiranmraine commented 8 years ago

Hi,

Have you considered just using the standard 'master.zip' instead of cloning the repository? It should be a much smaller download, especially for htslib, e.g.

curl -sSL --retry 10 -o master.zip https://github.com/Ensembl/Bio-HTS/archive/master.zip

Keiran

rishidev commented 8 years ago

I can adjust to limit the depth and branch fetched to achieve the speed gains. It would be good to preserve the clone as there may be the case where we'd like to tie Bio::DB::HTS to a particular tagged release of HTSlib in case an incompatibility arises.

rishidev commented 8 years ago

This has now been added in the master branch and will be in the next CPAN update which will be in a couple of weeks.

keiranmraine commented 8 years ago

Wouldn't you be as well served to use the automatic zip or tar.gz generated by the tag in that case?

https://github.com/Ensembl/Bio-HTS/archive/v1.12.zip

These are preserved and saves the requirement of git being installed on minimal builds (such as docker images).

rishidev commented 8 years ago

OK - if it's the git installation that's an issue it should be ok to move to the archive.

keiranmraine commented 8 years ago

One concern is that in the current state both Bio-HTS and HTSlib are a moving target as each install pulls from master. For both it's entirely possible that HTSlib could be modified in such a way that it no-longer functions with Bio-HTS.

It may be an idea to expose the branch option as a command line arg for INSTALL.pl and switch to a git clone if it's specified as that is helpful for testing feature/bugfix branches.

rishidev commented 8 years ago

We definitely want to tie the Build.PL script to a fixed HTSlib version. The separate INSTALL.pl script supplies the freedom to use alternative HTSlib versions. Would this suffice?

andrewyatz commented 8 years ago

I would say we need to explicitly tie Bio-HTS to a branch of htslib. Point revisions of htslib shouldn't bring in changes that are drastically bad for Bio-HTS (well htslib could be broken and it's a bugfix but the interface should be consistent).

Personally I never dev against anything bar the latest htslib version. There's just too many variables otherwise.

keiranmraine commented 8 years ago

So does this mean that we shouldn't use INSTALL.pl and basically roll our own for installing the dependancies? README doesn't indicate that INSTALL.pl isn't hard linked to a version.

An aside, README is written in markdown but as it isn't README.md it doesn't get the markup.

rishidev commented 8 years ago

To clarify you can use INSTALL.pl to use any customised version of HTSlib. I can be more specific in the README about this, and add a specific example of this in the script used to demonstrate different build options.

I've only developed against the latest item on master branch - I have found the dev branch too erratic.

rishidev commented 8 years ago

A README file is needed for CPAN and README.md for GitHub. Good old Stack Overflow indicates that if it's name README.pod it works in both places, so I'll give that a go.

andrewyatz commented 8 years ago

I think the main thing to make sure we do is to not rely on unreleased htslib changes. We might need to dev against them sometimes but for sure not release against them.

On 19 May 2016, at 10:17, rishidev notifications@github.com wrote:

To clarify you can use INSTALL.pl to use any customised version of HTSlib. I can be more specific in the README about this, and add a specific example of this in the script used to demonstrate different build options.

I've only developed against the latest item on master branch - I have found the dev branch too erratic.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/Ensembl/Bio-HTS/issues/15#issuecomment-220270820

keiranmraine commented 8 years ago

I'm not talking about the dev branch of HTSlib but the tagged entities. INSTALL.pl should be doing either:

wget https://github.com/samtools/htslib/archive/1.3.1.zip

or

git clone -b 1.3.1 --depth=1 https://github.com/samtools/htslib.git

The same should be true for the Bio-HTS so that unpacking a specific release of Bio-HTS and running INSTALL.pl actually gives the version you got from the release. At the moment if you run INSTALL.pl you get the broken version that was merged to master yesterday.

Modifying INSTALL.pl to require the version would resolve this:

wget https://raw.githubusercontent.com/Ensembl/Bio-HTS/master/INSTALL.pl
./INSTALL.pl -v v1.12

At present if anyone was to rebuild a Dockerfile that runs INSTALL.pl they would get inconsistent results (lots of warnings) even though they haven't changed anything.

rishidev commented 8 years ago

Sorry, my fault, I had Build.PL and INSTALL.pl the wrong way round in my head when I was writing that. Yes, the plan is to tie to a specific versions in INSTALL.pl

rishidev commented 8 years ago

Updates have been made to download the zip files and use a specific HTSlib version. As it currently stands having the specific version Bio::DB::HTS version in the INSTALL.pl ends up causing other problems with testing. Perhaps the correct thing to do is skip the fetch of a second remote Bio::DB::HTS and just use whatever Bio::DB::HTS is with the install script.

mmokrejs commented 8 years ago

Hi, I would recommend to avoid any network downloads from under "install" step. Just record in README users should install htslib on their own, and which versions are support. Do not bundle any 3rd-party code, please.

Also, please update the release file at http://search.cpan.org/dist/Bio-HTS.

Finally, why isn't the project named Bio-DB-HTS? It is confusing to have Bio-HTS, it just does not follow the common syntax.

rishidev commented 8 years ago

Unfortunately the naming is a result of issues in the early days of the project. There was a bit of trying to preserve the Bio-Samtools way of doing things (which is no longer supported) and another CPAN project had taken the Bio-HTS name, which resulted in the current nomenclature.

Again, the install script is an update of Bio-Samtools, where users have come to expect the CPAN install step to run as it currently does. The README and sample file in the scripts directory provide examples of alternative install methods for a number of techniques.

rishidev commented 8 years ago

Actually @mmokrejs I can see the naming is the source of the confusion This project is on CPAN at https://metacpan.org/pod/Bio::DB::HTS

mmokrejs commented 8 years ago

Actually @mmokrejs I can see the naming is the source of the confusion This project is on CPAN at https://metacpan.org/pod/Bio::DB::HTS

Yes, that is why I think you should really rename the github project from https://github.com/Ensembl/Bio-HTS to https://github.com/Ensembl/Bio-DB-HTS, and update all files and references. The site with http://search.cpan.org/dist/Bio-HTS contains really a different code.

rishidev commented 8 years ago

I guess we can duplicate this into a Bio::DB::HTS repository, and then have both in parallel for a period to give us a chance to update other repositories pointing to this location without things falling over.

rishidev commented 8 years ago

Migrated repo to https://github.com/Ensembl/Bio-DB-HTS