aradenbaugh / radia

RADIA: RNA and DNA Integrated Analysis for Somatic Mutation Detection
GNU Affero General Public License v3.0
29 stars 11 forks source link

filterByBlat.py list index out of range error #7

Closed sagarc88 closed 5 years ago

sagarc88 commented 5 years ago

Getting following error while running filterRadia.py

03/22/2019 07:50:43 PM  ERROR   The return code of '1' from the following filter command indicates an error.
03/22/2019 07:50:43 PM  ERROR   Error from python filterByBlat.py PT1 test/filtered/PT1_dnaFiltered_chr1.vcf test/filtered/PT1_blatOutput_chr1.blast -o test/filtered/PT1_blatFiltered_chr1.vcf --allVCFCalls --blatRnaNormalReads --blatRnaTumorReads:
Traceback (most recent call last):
  File "filterByBlat.py", line 621, in <module>
    main()
  File "filterByBlat.py", line 518, in main
    (isValidRead, validRead) = is_valid_read_blast_format(blatHitList, vcfChr, vcfStopCoordinate, 0, i_debug)
  File "filterByBlat.py", line 176, in is_valid_read_blast_format
    readFlag = int(blatReadIdSplit[9])
IndexError: list index out of range

Upon further investigation, it looks like the written Read ID in createBlatFile.py (line 488) was reformatted during commit ID: #05c069f and the "Readlength" and "position" fields were removed. However, the filterByBlat.py file was never changed to account for these missing values in the read ID.

aradenbaugh commented 5 years ago

Sorry for the inconvenience. This bug has been fixed. Please grab the latest code. Let me know if you run into any other issues!

sagarc88 commented 5 years ago

Thank you for looking into this so quickly! I am running into another error and hope you can help..

03/26/2019 07:48:20 PM  WARNING Warning from the following filter command python mergeRnaAndDnaFiles.py PT1 1 test/filtered/PT1_mpileup_dna_origin_chr1.vcf test/filtered/PT1_mpileup_rna_origin_chr1.vcf test/filtered/PT1_overlap_chr1.vcf test/filtered/PT1_blatFiltered_chr1.vcf test/filtered/PT1_merged_chr1.vcf --dnaHeaderOnly

and it prints out a lot of VCF entries along with the following error:

Traceback (most recent call last):
  File "mergeRnaAndDnaFiles.py", line 521, in <module>
    main()
  File "mergeRnaAndDnaFiles.py", line 494, in main
    (headerList, coordinateDict) = merge_vcf_data(i_dnaFilename, i_rnaFilename, i_overlapsFilename, i_nonOverlapsFilename, i_dnaHeaderOnly, i_debug)
  File "mergeRnaAndDnaFiles.py", line 386, in merge_vcf_data
    dnaLine = coordinateDict[rnaStopCoordinate]
KeyError: '70705199'
sagarc88 commented 5 years ago

Thank you for looking into this so quickly! I am running into another error and hope you can help..

03/26/2019 07:48:20 PM  WARNING Warning from the following filter command python mergeRnaAndDnaFiles.py PT1 1 test/filtered/PT1_mpileup_dna_origin_chr1.vcf test/filtered/PT1_mpileup_rna_origin_chr1.vcf test/filtered/PT1_overlap_chr1.vcf test/filtered/PT1_blatFiltered_chr1.vcf test/filtered/PT1_merged_chr1.vcf --dnaHeaderOnly

and it prints out a lot of VCF entries along with the following error:

Traceback (most recent call last):
  File "mergeRnaAndDnaFiles.py", line 521, in <module>
    main()
  File "mergeRnaAndDnaFiles.py", line 494, in main
    (headerList, coordinateDict) = merge_vcf_data(i_dnaFilename, i_rnaFilename, i_overlapsFilename, i_nonOverlapsFilename, i_dnaHeaderOnly, i_debug)
  File "mergeRnaAndDnaFiles.py", line 386, in merge_vcf_data
    dnaLine = coordinateDict[rnaStopCoordinate]
KeyError: '70705199'

Upon further investigation, it looks like the problem comes from --rnaOnly flag when running filterRadia.py. This leads to a --dnaHeaderOnly flag when running mergeRnaAndDnaFiles.py. With this flag, it never adds anything from the "aDnaFile" to the coordinateDict and the lookup of a key somehow fails. When running the above mergeRnaAndDnaFiles.py command without --dnaHeaderOnly flag, it completes without any problems.

aradenbaugh commented 5 years ago

I have fixed the bug with the KeyError that you mention here, but I would actually recommend that you do not use the --rnaOnly flag. The --dnaOnly and --rnaOnly flags basically only control which filters are applied. Here is an explanation of the flags:

--dnaOnly:
No matter what data is provided, only apply the DNA filters. When RADIA was initially developed back in 2010, it was important to provide a flag to provide DNA based calls that could be compared to other DNA mutation calling algorithms that did not include RNA.

--rnaOnly: No matter what data is provided, only apply the RNA filters. This option was provided as the opposite of the --dnaOnly flag, but it is misleading. For example, the result of supplying the --rnaOnly flag means that there will be no passing germline calls, because the DNA filters were not applied. Also, all of the non-passing somatic calls will only have the RNA filters instead of the combined DNA + RNA filters.

I believe what you want is to apply all DNA and RNA filters and then extract the RNA specific calls (RNA Confirmation, RNA Rescue, and RNA Editing). You can do this by leaving the --dnaOnly and --rnaOnly flags set to their defaults, and then inspecting the final calls. All of the calls are labeled in the INFO field as follows:

Sorry for the confusion. Let me know if you have any other issues or questions.

aradenbaugh commented 5 years ago

As I stated in my last comment, the result of supplying the --rnaOnly flag was misleading. In the latest commit, I disabled the --rnaOnly flag. As stated above, by default, when merging the RNA and DNA results, we want to provide as much information as possible.

Note: I also updated the dbSnp files from version 150 to 151, and I have officially tagged this latest version of RADIA to v1.1.4. Let me know if you find any other issues. Thanks!

sagarc88 commented 5 years ago

Ok perfect! THank you very much for all the info!