daveuu / baga

Bacterial and Archaeal Genome Analyser
GNU General Public License v3.0
9 stars 2 forks source link

bwa dependency checking miscommunication #11

Open rafarios50 opened 7 years ago

rafarios50 commented 7 years ago

When trying to do the Alignment step of the reads against the reference I executed first:

baga/baga_cli.py Dependencies --checkgetfor AlignReads

And all the dependencies were installed and found correctly. But then when I executed the alignreads command:

/home/rrios/baga/baga_cli.py AlignReads -n TX6128 -g TX0082 -a -d

The output was:

Bacterial and Archaeal Genome Analyser: Novel analyses and wrapped tools pipelined for convenient processing of genome sequences Version 0.2 (December 20 2015) David Williams david.williams.at.liv.d-dub.org.uk Work on this software was started at The University of Liverpool, UK with funding from The Wellcome Trust (093306/Z/10) awarded to: Dr Steve Paterson (The University of Liverpool, UK) Dr Craig Winstanley (The University of Liverpool, UK) Dr Michael A Brockhurst (The University of York, UK) Copyright (C) 2015 David Williams License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it There is NO WARRANTY, to the extent permitted by law 09:53:59 09:53:59 |=== Starting baga analysis at 09:53:59 on Fri 23 Sep, 2016 ===| 09:53:59 -- Read Aligning module -- 09:53:59 Logger for AlignReads will write to baga-nosample_logs/16-09-23_09-53-59_AlignReads/00_main.log baga.CollectData.Genome-TX0082.baga baga.PrepareReads.Reads-TX6128.baga Loading processed reads group TX6128 Loading genome TX0082 Aligning reads . . . Writing BWA index files for genome_sequences/TX0082.fna Could not find the bwa executable at executable at /home/rrios/otros/Efm_fnm_deletion_reads/bagadev/external_programs/bwa/bwa. You can check if it is installed using: /home/rrios/baga/baga_cli.py Dependencies --check bwa You can install it locally using: /home/rrios/baga/baga_cli.py Dependencies --get bwa

Then I executed:

/home/rrios/baga/baga_cli.py Dependencies --get bwa

And it solved the problem but there is some miscommunication with in those calls.

rafarios50 commented 7 years ago

Also other problem with paths for the external programs needed.

When I executed:

/home/rrios/baga/baga_cli.py AlignReads -n TX6128 -g TX0082 -a -d

The bwa step worked fine but the path were the bwa was installed was not the one I expected. I was expecting to find it in the directory were baga is installed, not where baga is being executed. I installed baga on ~/baga and execute it in ~/otros/Project_name/ and found bwa on ~/otros/Project_name/baga/other_progams/bwa

And that location was used to look for the other programs needed (samtools, pickard) and those were not found so, the whole read alignment did not finish.

Unless you expect that the baga workflow is being executed on the same path were it was installed, and when it is done the resulting files are moved to other paths.

daveuu commented 7 years ago

Hi Rafa, thanks for the report. I'll look into the dependency check miscommunication.

Regarding install locations, the baga philosophy is to maintain tight control over which versions of which software are used for each stage. This greatly improves the chances of reproducing an analysis by a different user on a different computer which is desirable because reproducibility is still a challenge in peer reviewed academic research (especially for analytical pipelines with many discrete analysis stages).

By default, baga currently installs in subfolders of the "analysis folder". Eventually, there'll be an explicit option to use the system installed versions instead. The idea is that each analysis folder uses the same set of version controlled software but that software could be used for several different analyses within that folder. Each analysis would go into different subfolders.

To repeat an analysis using different versions of software, a different "analysis folder" should be created and baga should be called from there.

The problem with installing software where baga is installed, is that each combination of versions of third party, wrapped, software would require a new version of baga. Not a major problem but a design choice.

The best solution might be to chose where software is installed (which path) at the command line with a default as the analysis folder. The user would then have more control over how they deal with different version combinations, datasets and analyses pipelines (and could chose the baga folder if they wanted).

Does that make sense?