NationalGenomicsInfrastructure / piper

A genomics pipeline build on top of the GATK Queue framework
9 stars 9 forks source link

To allow institution-specific setup #52

Closed biocyberman closed 7 years ago

biocyberman commented 9 years ago

I would like to suggest merging of the following two commits:

Commit aac74bce7e22

https://github.com/biocyberman/piper/commit/aac74bce7e22137e6fb7a539f21102c7ef5ebf3e

I created a 'config' directory at project base and moved three original files to 'config/upsala': globalConfig.sh globalConfig.xml uppmax_global_config.xml. I then modified setup.sh file to allow this syntax:

./setup.sh /path/to/install/piper institution /path/to/otherlibraries/lib NoGATKCompile

An example command is like this: ./setup.sh /space/system/piper afmd /space/system/lib nogatk

Which means:

  1. To install piper to /space/system/piper
  2. To pickup config files under config/afmd directory. Current possible choices are afmd and upsala, but more subdirectories for any new institutions can be added.
  3. Register /space/system/lib to piper so it can look for additional library dependencies (If they are put there, of course)
  4. The fourth argument, whenever none-empty, will tell setup.sh not to download and compile GATK. This is nicer than commenting/uncommenting the function call.

Commit 704a1ac9d26cc686d This is an additional commit to remove dependency on GenomeAnalysisTK.jar. This is because Queue.jar already includes content of GenomeAnalysisTK.jar. Putting both of them together in lib causes warning about redundant packages found, such as the one about org.slf4j bindings.

https://github.com/biocyberman/piper/commit/704a1ac9d26cc686d460fd93dc9b551f6ced77f5

My commit tree looks a bit messy, I will try to keep them in good order from now on :)

johandahlberg commented 9 years ago

Hi!

Awesome PR. Unfortunately I don't have time to review it just now. But I've done a quick read through, and I think that it looks good. Until I have time to look at this more thoroughly maybe you could get the travis build working? As you will see (here)[https://travis-ci.org/NationalGenomicsInfrastructure/piper/jobs/50278087] it's complaining about not finding some dependencies (scroll to the end of the build log).

Thanks for the PR! :smile:

biocyberman commented 9 years ago

Hi Johan travis failed because it JHDF5 library to deal with XSQ colorspace data was not uploaded. I will fix that. However, I am thinking about logistics problem. Would it be better to submit PRs to master branch so we can 'communicate' through this common ground? Stable releases can be done via release tags that you have been doing or on a 'stable' branch. If you agree I will close this PR and open a new one to the master branch. PLEASE GIVE YOUR COMMENT ON THIS.

And some more words about the QPipe branch, even though I try to keep the 'upstream' intact, I had to make some changes to simplify the implementation of QPipe branch: To support XSQ and maybe BAM inputs later. I've made some rather fundamental changes in addtion to the one described in the original PR:

  1. Rewrite ReadPairContainer class and refactor it to InputSeqFileContainer. Follow this I refactored all uses of ReadPairContainer in the repo. Most of the cases it is just a matter of renaming because InputSeqFileContainer is backward-compatible. InputSeqFileContainer supports more flexible and more generic ways to deal with different input formats.
  2. Configure logging to work properly. However, this also require to upgrade org.slf4j in Queue to at least 1.7.5 version. So, setup.sh will NOT work if it enables compiling of GATK. An exception of MethoNotFound will occur because of the outdated version of SLF4J embedded in Queue/GATK. The best way is that I will ask to upgrade SLF4J in Queue/GATK at its origin.
  3. Upgrade scala to 2.10.4 and sbt to 0.13.7.
  4. Upgrade commons-lang, commons-io package. This is because somehow these two packages in Piper's SBT build intefer with the same package names in Queue/GATK, causing again MethodNotFound exception when Queue tries to use FileUtils.FileCopy method.

I hope will these changes to be merged into the master branch of Piper. Since it was sudden jump for me to Piper, I had to do several experiments and therefore forgot to commit all changes on time. This results in not so well isolated commits. I will anyways, do a 'clean-up' commit and start with well isolated ones. Back in the head, I think the code base is not too complicated, so it is not to bad to do this clean-up commit. This may challenge your patience :-)

biocyberman commented 9 years ago

Hi Johan, I have just pushed some new commits to QPipe branch. Notably in commit 894e8ac, I converted the project to Maven-based, which allows straight forward integration with GATK/Queue. This is important because it will be easy to refer to Queue source or including dependency directly from github Maven-based repos (i.e. gatk-projected repo).

johandahlberg commented 9 years ago

Hi!

I'm just dropping a note here to let you know that I've not forgotten this. However I've been bogged down with a lot of other stuff to do lately and I've haven't had the time to look into this. I'll promise to get to it as soon as time permits.

/Johan

biocyberman commented 9 years ago

No worries Johan, I am in a not-so-different situation.

johandahlberg commented 9 years ago

I've now had a quick read through of the code. and you can see my comments on it inline.

Some things of more general node:

biocyberman commented 9 years ago

Hi Johan, Thanks for taking time to look at the PR. Regarding the last two points:

I will come back for some more commits later to get you updated with the latest.

biocyberman commented 9 years ago

Hi Johan I addressed most of the points you commented in the previous commits. As you can see only some non-essential points are left. When all are done, just tell me if you want me to create a PR to the master branch. Otherwise you can just merge to the master branch by yourself.

johandahlberg commented 7 years ago

Hi @biocyberman! I've been feeling guilt about not addressing this for a very long time. I'm very sorry for leaving this hanging here.

The scope of Piper has drifted a bit over time, and now it's mostly being used to power our human whole genome sequencing pipelines - which means that at this point I'm skeptical about it being worth the effort for anyone else to go into the project. I'm going to close this pull request now - but if you feel like you are still interested in extending Piper or if you would like me to put you into contact with some other folks I know who seem to me to be working on things which are more easily adaptable to different centers, send me an e-mail at johan.dahlberg@medsci.uu.se and we can have a chat about that.

Once again I'm so sorry for not coming back to you on this, much, much, sooner.