epam / fonda

Fonda is a framework which offers scalable and automatic analysis of multiple NGS sequencing data types
Apache License 2.0
8 stars 3 forks source link

Preserve temporary files #205

Closed syansanofi closed 3 years ago

syansanofi commented 3 years ago

Issue
Currently most of the intermediate files generated in a pipeline are deleted. Examples of these are sorted bams or unduplicated bams in DNAseq pipelines.

To help with troubleshooting and further analysis, it would best to preserve these files.

Approach
Remove only the merged fastqs for any pipeline. Files after this stage should be preserved. Other files such as xenome removed or trimmed fastqs and later should be preserved. For example we can remove all the files listed from being marked as tempDirs in the Star tool.

https://github.com/epam/fonda/blob/9322c5f7ddf34e7c4068b788c52c8599337f4202/src/main/java/com/epam/fonda/tools/impl/Star.java#L122-L123

syansanofi commented 3 years ago

@kamyshova Would this also solve #144 ? Thank you.

kamyshova commented 3 years ago

@syansanofi Should we preserve the temporary tool folders? For example: command.setTempDirs(Collections.singletonList(fusionCatcherFields.tmpFusionCatcherOutdir))

syansanofi commented 3 years ago

@syansanofi Should we preserve the temporary tool folders? For example: command.setTempDirs(Collections.singletonList(fusionCatcherFields.tmpFusionCatcherOutdir))

Yes I think they would contain files of interest like xenome output..etc