epam / fonda

Fonda is a framework which offers scalable and automatic analysis of multiple NGS sequencing data types
Apache License 2.0
8 stars 2 forks source link

Updates from 6/05/20 #132

Closed kamyshova closed 4 years ago

kamyshova commented 4 years ago

This PR involves some Fonda enhancements:

syansanofi commented 4 years ago

Hi Yulia, can you delete/un-approve my previous approval? I found new issues that should be addressed before merging.

syansanofi commented 4 years ago

If lower picard versions are used, other picard dependent modules will incorrectly output empty picard.jar paths due to picard_version flag ignorance.

See example output for SortBamByReadName shell command. /site/ne/app/x86_64/java/v1.8.0u121/bin/java -jar /site/ne/home/wings/bfx_tools/picard/v1.118 SortSam

kamyshova commented 4 years ago

@syansanofi Hi Shu, thank you for the review! I do not see the possibility to delete your previous approval, but I've re-requested your review. Regarding remarks about other picard dependencies, this is a really valuable remark. I looked at all the places where picard could be used and found that for some metrics we could not use lower picard versions, because they were added much later. For example, at the "RNA QC metrics" and "DNA QC metrics" steps "CollectHsMetrics" and "CollectQualityYieldMetrics" are used, but they were added to picard later. Should we add some checks and throw an error if an older version is used for this step? I've added the picard version for other metrics and tools that it's used: SortSam, SamToFastq, CollectGcBiasMetrics, MergeSamFiles. Could you review it, please?

syansanofi commented 4 years ago

@kamyshova Hi Yulia. Indeed I saw the discrepancy. The idea of this feature was to deploy fonda with backwards compatibility to historical analysis. It is not an urgent feature for the moment. Could we approve this and then merge back? I want to be able to add some new features to this branch, but keep it as a separate PR. Thanks!