broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.61k stars 574 forks source link

Prevent users enabling annotations with mismatching data type (flow etc) #8788

Open gokalpcelik opened 3 weeks ago

gokalpcelik commented 3 weeks ago

Here is an user trying to enable flow based annotations for non-flow data.

https://gatk.broadinstitute.org/hc/en-us/community/posts/24596063911963-Error-while-running-Mutect2-java-lang-IllegalArgumentException-the-index-points-past-the-last-element-of-the-collection-or-array-334-333

What could be done to prevent this? Possible ideas

jamesemery commented 3 weeks ago

@ilyasoifer Looks like this user got tricked by some of the flow based annotations that don't work on their data. I would like to cut down on the risk that this happens for users. If we had more foresight I would advocate renaming all of the flow specific annotations to something like "flowbased_#####". How destructive would this be for your pipelines?

We have some appropriate checks in GATK for the flow-ness of the bam that give warnings more broadly about flow-based mode but we don't currently have any safeguards in the annotations. Thoughts?

ilyasoifer commented 3 weeks ago

@jamesemery, @gokalpcelik - this looks like a bug actually, this annotation should work on any data IMO. Would it be possible to get data to reproduce this crash? We should fix this quickly.

gokalpcelik commented 3 weeks ago

Looks like some of the code have changed since then line numbers don't match with the master branch. Should we ask user to try running this annotation with the latest GATK?

ilyasoifer commented 3 weeks ago

@gokalpcelik I see that the bug exists in the updated code too. We can fix it, but would be good to have some dataset that can be used to validate. Any chance to ask the user to generate a small example?

gokalpcelik commented 3 weeks ago

I asked the user to send us a piece of data. On the other hand the file name starts with SRR6127704. Is this something that we can get from SRA?

ilyasoifer commented 3 weeks ago

I think even more important is that they share the reference genome that they are using.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Gökalp Çelik @.> Sent: Wednesday, April 24, 2024 12:28:39 AM To: broadinstitute/gatk @.> Cc: Ilya Soifer @.>; Assign @.> Subject: Re: [broadinstitute/gatk] Prevent users enabling annotations with mismatching data type (flow etc) (Issue #8788)

Assigned #8788https://github.com/broadinstitute/gatk/issues/8788 to @ilyasoiferhttps://github.com/ilyasoifer.

— Reply to this email directly, view it on GitHubhttps://github.com/broadinstitute/gatk/issues/8788#event-12581899218, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGDPRCP66IKPOBF2GPENP6LY63HAPAVCNFSM6AAAAABGTGMBPWVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJSGU4DCOBZHEZDCOA. You are receiving this because you were assigned.Message ID: @.***>


CONFIDENTIALITY NOTICE: This message (including any attachments) should be presumed to contain confidential, proprietary, privileged and/or private information. Information contained in this message is intended only for the recipient(s) named above. Any disclosure, reproduction, distribution or other use of this message or any attachments by any unauthorized individual or entity is strictly prohibited. If you have received this message in error, please notify the sender immediately, and delete the message and any attachments. Learn more about Ultima Genomics’ Privacy Policyhttps://www.ultimagenomics.com/privacy-policy.

ilyasoifer commented 1 week ago

@gokalpcelik, @jamesemery - I submitted a PR (#8810) that should be fixing this issue. Could you please take a look? Thanks!