Open rafarios50 opened 7 years ago
Also other problem with paths for the external programs needed.
When I executed:
/home/rrios/baga/baga_cli.py AlignReads -n TX6128 -g TX0082 -a -d
The bwa step worked fine but the path were the bwa was installed was not the one I expected. I was expecting to find it in the directory were baga is installed, not where baga is being executed. I installed baga on ~/baga and execute it in ~/otros/Project_name/ and found bwa on ~/otros/Project_name/baga/other_progams/bwa
And that location was used to look for the other programs needed (samtools, pickard) and those were not found so, the whole read alignment did not finish.
Unless you expect that the baga workflow is being executed on the same path were it was installed, and when it is done the resulting files are moved to other paths.
Hi Rafa, thanks for the report. I'll look into the dependency check miscommunication.
Regarding install locations, the baga philosophy is to maintain tight control over which versions of which software are used for each stage. This greatly improves the chances of reproducing an analysis by a different user on a different computer which is desirable because reproducibility is still a challenge in peer reviewed academic research (especially for analytical pipelines with many discrete analysis stages).
By default, baga currently installs in subfolders of the "analysis folder". Eventually, there'll be an explicit option to use the system installed versions instead. The idea is that each analysis folder uses the same set of version controlled software but that software could be used for several different analyses within that folder. Each analysis would go into different subfolders.
To repeat an analysis using different versions of software, a different "analysis folder" should be created and baga should be called from there.
The problem with installing software where baga is installed, is that each combination of versions of third party, wrapped, software would require a new version of baga. Not a major problem but a design choice.
The best solution might be to chose where software is installed (which path) at the command line with a default as the analysis folder. The user would then have more control over how they deal with different version combinations, datasets and analyses pipelines (and could chose the baga folder if they wanted).
Does that make sense?
When trying to do the Alignment step of the reads against the reference I executed first:
And all the dependencies were installed and found correctly. But then when I executed the alignreads command:
The output was:
Then I executed:
And it solved the problem but there is some miscommunication with in those calls.