Closed lconde-ucl closed 5 years ago
Hi,
We're currently finishing the transfer of Scilifelab/Sarek to nf-core/sarek.
If it's possible for you, you can try out the current dev
branch which is pretty stable as we're in the process of making our first release on nf-core (cf https://github.com/nf-core/sarek/pull/19).
Otherwise, I do believe that your issue is due to the fact that procps
is missing from the container.
I can as well generate an extra container with ps
, if you know which version of Sarek you need, but it might take some time.
All the best,
Maxime
Hi Maxime,
thanks for your reply, I'll wait for the transfer to finish and in the meantime I will run the current Sarek (v2.3.FIX1) with nextflow 19.01. This works fine (it only fails with the new nextflow, v19.07)
Not sure the lack of ps is the issue though, I downloaded Sarek v2.3.FIX1 as well as the containers (sarek, r-base, runallelecount...) just a couple of days ago, and they all seem to have ps:
$ singularity exec sarek-latest.simg ps
PID TTY TIME CMD
47583 pts/0 00:00:00 bash
49905 pts/0 00:00:00 ps
OK, that is strange indeed.
Maybe we do need uname
, grep
and date
as well...
sarek-latest.simg indeed has uname, grep and date. I don't think this is related to the container, as the same container works well with nextflow 19.01. I'll try to send an example .command.run from 19.01 and another from 19.07, as I guess that's the key
Attaching here the .comman.run files for a Sarek 'RunFastQC' process.
One was obtained after running Sarek v2.3.FIX1 with nextflow v19.07.0. It failed immediately in that process with the error shown in the first message:
$ module load nextflow/19.01
$ nextflow run path_to_Sarek_v2.3.FIX1/main.nf -profile ucl \
--sample inputfiles.tsv \
--genome GRCh38 \
--targetBED path_to_MedExome_hg38_capture_targets.bed -resume
The other one was obtained after running the same Sarek.v2.3.FIX1 pipeline (i.e, same main.nf, same containers, same data, etc) but using nextflow 19.01.0 instead of 19.07.0. The same process finished successfully, as well as the rest of the pipeline.
$ module load nextflow/19.07
$ nextflow run path_to_Sarek_v2.3.FIX1/main.nf -profile ucl \
--sample inputfiles.tsv \
--genome GRCh38 \
--targetBED path_to_MedExome_hg38_capture_targets.bed -resume
I can't figure out what nextflow 19.07 is doing differently that makes Sarek crush, but maybe it can be guessed from these files?
command.run_Nextflow.19.07.txt command.run_Nextflow.19.01.txt
Given your error message, I would say that this line is the issue, but I still find it strange that it's not working if there is ps in these container...
command -v ps &>/dev/null || { >&2 echo "Command 'ps' required by nextflow to collect task metrics cannot be found"; exit 1; }
I'll try to look more into it, with both these version during the week whenever I can find some time. All the best, Maxime
Hi Maxime, thanks, no rush! And yes, I can confirm that the container has ps:
$ singularity exec /scratch/scratch/regmr01/Sarek_containers/v2.3.FIX1/sarek-latest.simg ps
PID TTY TIME CMD
13798 pts/3 00:00:00 bash
14305 pts/3 00:00:00 ps
I downloaded only days ago using:
singularity build sarek-latest.simg docker://maxulysse/sarek:latest
Best regards Lucia
Hi @lconde-ucl I'm sorry, I did not really found time to look into it since we moved sarek to nf-core/sarek. Did you try out the new version? Or would you still want a fix on the SciLifeLab/Sarek repo?
Hi Max, thanks for getting back to me. No worries, I'm trying the nf-core version and I will probably just keep using that one.
Since you are here, I have a quick question regarding the new version (nf-core/sarek) that maybe you can easily answer: why the FilterMutect2Call step is only run when a PON is provided? Can this be changed so that is run always? It makes no sense to me to run mutect2 on a pair of tumour/normal samples and not do the filtering afterwards to get all the PASS variants, regardless if there is PON or not.
It can probably be done, I'll ask @szilvajuhos about this one.
OK, so basically it's possible to enable that, but I'm not sure it's such a good idea. I think the PON really does help out in the filtering, so I'm not sure what you would end up with. If you really want that, I'll enable it with a warning. But I'd advice against it.
Hi Max,
Thanks a lot. Yes, I understand that using a PON is recommended and is definitely of great help when you don't have matching normals. But when you do have matching tumour/normal pairs, it will just be another filter for systematic sequencing/protocol/pipeline artifacts.
But regardless if you use a PON or not, the filtering ("FilterMutectCalls" step) has to be done anyway, otherwise "mutect2" would only emit an unfiltered VCF. "FilterMutectCalls" provides the annotations to the variants called by mutect2 (either PASS or failed because of "germline_risk", "multiallelic", "clustered_events", "bad_haplotype", etc), so irrespective of the use of a PON or not, it is needed to get a properly annotated VCF.
As an example, this is a VCF file that you would get after running mutect2 without any PON:
> gatk Mutect2 -R $ref -I $tumor.bam -I $normal.bam -tumor T -normal N -O unfiltered.vcf --germline-resource ...
[...]
chr1 84490478 . C CT . . DP=60;ECNT=1;NLOD=5.02;N_ART_LOD=-8.630e-01;POP_AF=1.000e-06;REF_BASES=CCCAAGTATCCTTTTTTTTTT;RPA=11,12;RU=T;STR;TLOD=10.72 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB
0/1:16,6:0.377:8,2:8,4:34,34:121,122:60:30:0.192,0.263,0.273:0.046,0.013,0.941 0/0:17,0:0.127:11,0:6,0:35,0:181,0:0:0
chr1 100206310 . TAA T,TA,TAAA . . DP=263;ECNT=1;NLOD=-2.149e+00,-2.977e+01,4.27;N_ART_LOD=12.16,31.15,8.65;POP_AF=1.000e-06,4.452e-03,1.000e-06;REF_BASES=ATCTATTTTTTAAAAAAAAAA;RPA=13,11,12,14;RU=A;STR;TLOD=7.59,23.90,1
1.86 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1/2/3:51,7,16,9:0.135,0.212,0.158:15,4,4,3:36,3,12,6:29,28,29,23:122,100,107,104:60,60,60:17,34,17:0.182,0.172,0.193:0.019,0.011,0.970 0/0:59,8,22,12:0.119,0.223,0.158:20,5,6,
3:39,3,16,9:32,33,32,32:186,193,183,154:60,60,60:29,25,26
chr1 102889509 . T G . . DP=47;ECNT=1;NLOD=5.72;N_ART_LOD=-1.301e+00;POP_AF=1.000e-06;REF_BASES=CTCGGTCACCTTTTTCCCCTT;TLOD=25.83 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:19,9:0.321:14,6:5,3:
34,35:103,100:60:27:0.303,0.273,0.321:0.014,0.034,0.952 0/0:19,0:1.054e-04:9,0:10,0:35,0:180,0:0:0
[...]
And this is what you get after running filterMutect2Calls on the above file:
> gatk FilterMutectCalls -V unfiltered.vcf -O final.vcf
[...]
chr1 84490478 . C CT . str_contraction DP=60;ECNT=1;NLOD=5.02;N_ART_LOD=-8.630e-01;POP_AF=1.000e-06;P_CONTAM=0.00;P_GERMLINE=-5.511e+00;REF_BASES=CCCAAGTATCCTTTTTTTTTT;RPA=11,12;RU=T;STR;TLOD=10.72 GT:AD:AF:F1R2:F2R1:MBQ:M
FRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:16,6:0.377:8,2:8,4:34,34:121,122:60:30:0.192,0.263,0.273:0.046,0.013,0.941 0/0:17,0:0.127:11,0:6,0:35,0:181,0:0:0
chr1 100206310 . TAA T,TA,TAAA . artifact_in_normal;germline_risk;multiallelic DP=263;ECNT=1;NLOD=-2.149e+00,-2.977e+01,4.27;N_ART_LOD=12.16,31.15,8.65;POP_AF=1.000e-06,4.452e-03,1.000e-06;P_CONTAM=0.00;P_GERMLINE=-5.710e+0
0,0.00,-1.101e+01;REF_BASES=ATCTATTTTTTAAAAAAAAAA;RPA=13,11,12,14;RU=A;STR;TLOD=7.59,23.90,11.86 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1/2/3:51,7,16,9:0.135,0.212,0.158:15,4,4,3:36,3,12,6:29,28,29,23:122,100,107,104:60,60
,60:17,34,17:0.182,0.172,0.193:0.019,0.011,0.970 0/0:59,8,22,12:0.119,0.223,0.158:20,5,6,3:39,3,16,9:32,33,32,32:186,193,183,154:60,60,60:29,25,26
chr1 102889509 . T G . PASS DP=47;ECNT=1;NLOD=5.72;N_ART_LOD=-1.301e+00;POP_AF=1.000e-06;P_CONTAM=0.00;P_GERMLINE=-6.212e+00;REF_BASES=CTCGGTCACCTTTTTCCCCTT;TLOD=25.83 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:S
A_POST_PROB 0/1:19,9:0.321:14,6:5,3:34,35:103,100:60:27:0.303,0.273,0.321:0.014,0.034,0.952 0/0:19,0:1.054e-04:9,0:10,0:35,0:180,0:0:0
[...]
The difference is that now the FILTER column (column 7) is filled in with annotations, and the variants that pass (only the last one in this case) can be identified. This is independent of the PON.
I hope this makes sense?
That's why I think the FilterMutectCalls steps should be always done. If you feel you would like to enable it with a warning that's fine. Or if you prefer to disable it by default and have an option to enable it without a PON I think is fine to.. Whatever you think is best, but it would be great to give the user the option to use it without a PON.
Sorry for the long reply! Many thanks as always, Lucia
Closing issue, as a new one has been opened on nf-core/sarek
Hi,
I am having problems running Sarek with the latest version of nextflow (19.07). The error I get is:
The pipeline runs fine with the same data using an older version of nextflow (19.01).
I guess this is a nextflow-related issue (someone found the same problem here: https://github.com/nextflow-io/nextflow/issues/1289) but I wonder if this is something that can be fixed from within sarek? Happy to send .comand.run files from both nextflow versions if that helps?
Thanks Lucia