Output files/folder structure

epi2me-labs / wf-somatic-variation

Other

10 stars 5 forks source link

Output files/folder structure #9

Closed myxotheles closed 9 months ago

myxotheles commented 9 months ago

Ask away!

Hi, the output structure seemed to have changed slightly in v4

Can I just clarify:

1) Are those two the same?

output/<sample_name>.wf_somatic_snv.vcf.gz
output/<sample_name>/snv/vcf/.wf_somatic_snv.vcf.gz

2) You are now annotating the variants which is great. But which of the two files above is used for annotation?

output/<sample_name>/annot/<sample_name>.wf_somatic-snv_clinvar.vcf
output/<sample_name>/annot/<sample_name>.wf_somatic-snv.snpEFF_genes.vcf

3) Lastly, I notice a slight inconsistency when using ".", "-" or "_" in the file names. Perhaps pedantic but consider homogenising like

.wf-somatic-snv-report.html 
.wf_somatic-snv-vcf.gz

Otherwise thank you for the great workflow! I really enjoy using it.

RenzoTale88 commented 9 months ago

Hi @myxotheles the two files are not the same:

output/.wf_somatic_snv.vcf.gz is the file after all the processing (includes SNVs and Indels, as well as the change types and annotations)
output/<sample_name>/snv/vcf/.wf_somatic_snv.vcf.gz is the SNVs as produced by ClairS
output/<sample_name>/annot/<sample_name>.wf_somatic-snv.snpEff_genes.vcf shouldn't be there, but it should be a .txt file generated by snpEff
<sample_name>.wf_somatic-snv_clinvar.vcf is just the variants found in common with the clinVar database

Thanks for the feedback about the naming, we will try to implement it in the upcoming releases. The next release will also bring forward an update to the documentation to disambiguate the outputs. Thanks for pointing that out, and we're glad that the workflow works fine for you :)

myxotheles commented 9 months ago

Dear Andrea, thank you for the clarification!

One other thing to add to be aware of.

If the workflow is run with non-nanopore data as control, NanomonSV will crash. Unfortunately, I only have Illumina data for my samples as control data so have to run with that. I wrote in the NanomonSV forum and they said its because its not mapped with minimap2. I aligned my Illumina reads with minimap2 but still crashes when using these as input. So I am running it without -SV and it works fine for the rest of the workflow, so the problem can be isolated to NanomonSV. I guess this may not be a big issues for nanopore users when tumour and control are both nanopore sequenced, but I thought it is worth mentioning.