bioinform / somaticseq

An ensemble approach to accurately detect somatic mutations using SomaticSeq
http://bioinform.github.io/somaticseq/
BSD 2-Clause "Simplified" License
194 stars 53 forks source link

Snakemake corrections #50

Closed 0xaf1f closed 6 years ago

0xaf1f commented 6 years ago

minor bugs in the workflow that would cause it to error out in some cases.

0xaf1f commented 6 years ago

Thanks for merging. There's one more outstanding issue-- this code passes the same GATK jar file to both SomaticSeq and Mutect2, but SomaticSeq cannot work with GATK4. This problem would go away if you make that happen. Otherwise, it's probably necessary to have a gatk3 config setting as well as a gatk4 setting rather than just one for both. Do you think it's worth making this split now, or is an update to GATK4 in the SomaticSeq wrapper coming soon?

lethalfang commented 6 years ago

I'll see if I can switch out GATK3 to GATK4 (i.e., GATK3's CombineVariants to GATK4's MergeVcfs). Best,

On Thu, Jan 25, 2018 at 3:36 PM, Afif Elghraoui notifications@github.com wrote:

Thanks for merging. There's one more outstanding issue-- this code passes the same GATK jar file to both SomaticSeq and Mutect2, but SomaticSeq cannot work with GATK4. This problem would go away if you make that happen. Otherwise, it's probably necessary to have a gatk3 config setting as well as a gatk4 setting rather than just one for both. Do you think it's worth making this split now, or is an update to GATK4 in the SomaticSeq wrapper coming soon?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bioinform/somaticseq/pull/50#issuecomment-360636928, or mute the thread https://github.com/notifications/unsubscribe-auth/AD3lYGdu3JkTs9zOFbG1rKLAWVG_lBamks5tORAAgaJpZM4RtfIv .

0xaf1f commented 6 years ago

That'd be great. Thanks!

lethalfang commented 6 years ago

Hi Afif,

I've been using GATK3's CombineVariants to merge VCF files. GATK4 doesn't have its equivalent. I could switch out GATK3 CombineVariants, but that'll probably take me a little while to do and then validate. So for now, should I still merge your pull request?

Best,

On Fri, Jan 26, 2018 at 11:33 AM, Afif Elghraoui notifications@github.com wrote:

That'd be great. Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bioinform/somaticseq/pull/50#issuecomment-360882429, or mute the thread https://github.com/notifications/unsubscribe-auth/AD3lYEn8LgWYoDjrn4aStqjZgBBAi9riks5tOihugaJpZM4RtfIv .

0xaf1f commented 6 years ago

51? Well, I guess for now, it would break if anyone tried to run it as written, so maybe it's not a good idea to document it while it's known to be broken.

On the other hand, I saw that you did have support for running SomaticSeq without GATK at all, so maybe we should just not pass --gatk to the invocation of the SomaticSeq wrapper until GATK4 support there is ready. What do you think about that?

lethalfang commented 6 years ago

Yes, we can do that, not having --gatk argument at all. Without --gatk, it'll just cat all the VCF and then sort using this command:

cat $all_snp | egrep -v '^#' | awk -F "\t" '{print $1 "\t" $2 "\t.\t" $4

"\t" $5}' | sort | uniq | awk -F "\t" '{print $1 "\t" $2 "\t" $3 "\t" $4 "\t" $5 "\t" "." "\t" "PASS" "\t" "."}' | cat <(echo -e '##fileformat=VCFv4.1\n#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO') - | $MYDIR/utilities/vcfsorter.pl ${hg_dict} - > ${merged_dir}/CombineVariants_MVJSD.snp.vcf

On Thu, Feb 15, 2018 at 1:08 PM, Afif Elghraoui notifications@github.com wrote:

51 https://github.com/bioinform/somaticseq/pull/51? Well, I guess for

now, it would break if anyone tried to run it as written, so maybe it's not a good idea to document it while it's known to be broken.

On the other hand, I saw that you did have support for running SomaticSeq without GATK at all, so maybe we should just not pass --gatk to the invocation of the SomaticSeq wrapper until GATK4 support there is ready. What do you think about that?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bioinform/somaticseq/pull/50#issuecomment-366062074, or mute the thread https://github.com/notifications/unsubscribe-auth/AD3lYHHWrRruZwZlatYMfZOyLQx1FcsTks5tVJzEgaJpZM4RtfIv .

0xaf1f commented 6 years ago

Ok. I'll file an issue for this GATK4 thing just for tracking purposes, and then I'll add another commit to #51.