UnifiedConsensusMaker.py IOError

maoyibo commented 8 years ago

When I use UnifiedConsensusMaker.py to make Consensus of the test reads SRR1613972, when the program sorting reads on tag sequence, I get this error message:

Traceback (most recent call last):
  File "./Duplex-Sequencing/UnifiedConsensusMaker.py", line 326, in <module>
    main()
  File "./Duplex-Sequencing/UnifiedConsensusMaker.py", line 170, in main
    in_bam_file = pysam.AlignmentFile(o.prefix + '.temp.sort.bam', "rb", check_sq=False)
  File "pysam/calignmentfile.pyx", line 311, in pysam.calignmentfile.AlignmentFile.__cinit__ (pysam/calignmentfile.c:4929)
  File "pysam/calignmentfile.pyx", line 480, in pysam.calignmentfile.AlignmentFile._open (pysam/calignmentfile.c:6905)
IOError: file `USRR2.temp.sort.bam` not found

How can I solve this problem? When I use UnifiedConsensusMaker.py to make Consensus . Do I need to use the SRAFixer.py to fix the reads of SRR1613972 ? Do I need to use the tag_to_header.py to modify the reads of SRR1613972 ?

When I do Duplex-Sequencing to find ultralow-frequency mutations, how high the sequencing depth should be?

maoyibo commented 8 years ago

I changed the 158s line ,and remove the parameter '-o' , so that the sort result will write to *.temp.sort.bam file instead of stdout.

pysam.sort("-n", o.prefix + ".temp.bam", "-o", o.prefix + ".temp.sort.bam")
change to
pysam.sort("-n", o.prefix + ".temp.bam", o.prefix + ".temp.sort.bam")

Rerun the program I will not heve the IOError again, but I get another one:

Creating consensus reads...
/mnt/lustre/PyModule/local/lib/python2.7/site-packages/matplotlib-1.4.3-py2.7-linux-x86_64.egg/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if self._edgecolors == str('face'):

Is the modification correct ? How can I solve this new problem?

scottrk commented 8 years ago

Check your version of matplotlib.

On 7/27/16 2:25 AM, maoyibo wrote:

I changed the 158s line ,and remove the parameter '-o' , so that the sort result will write to *.temp.sort.bam file instead of stdout.

|pysam.sort("-n", o.prefix + ".temp.bam", "-o", o.prefix + ".temp.sort.bam") |

Rerun the program I will not heve the IOError again, but I get another one:

|Creating consensus reads... /mnt/lustre/PyModule/local/lib/python2.7/site-packages/matplotlib-1.4.3-py2.7-linux-x86_64.egg/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison if self._edgecolors == str('face'): |

Is the modify correct ? How can I solve this new problem?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/loeblab/Duplex-Sequencing/issues/42#issuecomment-235534069, or mute the thread https://github.com/notifications/unsubscribe-auth/AB8PUAzhZQkrvA-IratM6Kb48Tow1hQXks5qZyQCgaJpZM4JV9Sd.

maoyibo commented 8 years ago

Thank you for your reply, the version of dependencies software is really important.There is no error anymore.

I have other questions. When I use UnifiedConsensusMaker.py to analysis SRR1613972 I get 88118 reads in read1 , but when I use pipline in Nat_Protocols_Version directory I get 99644 reads in read1. When I use UnifiedConsensusMaker.py to make Consensus . Do I need to use the SRAFixer.py to fix the reads of SRR1613972 ? Do I need to use the tag_to_header.py to modify the reads of SRR1613972 ?

When I do Duplex-Sequencing to find ultralow-frequency mutations, how high the sequencing depth should be? I tried with our Sequencing data, I get this Error:

python ./Duplex-Sequencing/UnifiedConsensusMaker.py --input P15076f21936_F2B.bam --taglen 9 --spacerlen 5 --tagstats --minmem 3 --maxmem 1000 --cutoff 0.7 --Ncutoff 0.3 --prefix PREAD
Parsing tags...
Sorting reads on tag sequence...
Creating consensus reads...
Traceback (most recent call last):
  File "./Duplex-Sequencing/UnifiedConsensusMaker.py", line 326, in <module>
    main()
  File "./Duplex-Sequencing/UnifiedConsensusMaker.py", line 316, in main
    plt.xlim(0, max(fam_size_x_axis))
ValueError: max() arg is an empty sequence

scottrk commented 8 years ago

This is a bit hard to explain, but for the Protocols version, that read count contains "dummy reads" consisting of all N's that are placed there when a read doesn't form a consensus, but it's partner does. Theoretically, that shouldn't happen, but, because of how the consensus maker decides if reads are related to one another, which is based on tag sequence and mapping position, a family can appear smaller than it really is due to mismappings of a subset of read, thus resulting in a failed consensus. This is replaced by a dummy read of all N's. In v3, which is not based on mapping position at all, this can't happen and, therefore, the number of reads isn't artificially inflated by the inclusion of the N's.

UnifiedConsensusMaker doesn't care about the header structure that is found in the SRA data. Picard might, but it, itself, does not. SRAfixer is the only thing that can be used with the SRR data set. tagtoheader is incompatible with UnifiedConsensusMaker.

I can't answer your question about depth. It will depend on how rare the mutations are that you're looking for.

Seems to me your error is caused by there not being information about family size. Have you checked to see if any data is being outputted to your dcs fastq file?

On 7/27/16 11:04 PM, maoyibo wrote:

Thank you for your reply, the version of dependencies software is really important.There is no error anymore.

I have other questions. When I use UnifiedConsensusMaker.py to analysis SRR1613972 I get 88118 reads in read1 , but when I use pipline in Nat_Protocols_Version directory I get 99644 reads in read1. When I use UnifiedConsensusMaker.py to make Consensus . Do I need to use the SRAFixer.py to fix the reads of SRR1613972 ? Do I need to use the tag_to_header.py to modify the reads of SRR1613972 ?

When I do Duplex-Sequencing to find ultralow-frequency mutations, how high the sequencing depth should be? I tried with our Sequencing data, I get this Error:

|python ./Duplex-Sequencing/UnifiedConsensusMaker.py --input P15076f21936_F2B.bam --taglen 9 --spacerlen 5 --tagstats --minmem 3 --maxmem 1000 --cutoff 0.7 --Ncutoff 0.3 --prefix PREAD Parsing tags... Sorting reads on tag sequence... Creating consensus reads... Traceback (most recent call last): File "./Duplex-Sequencing/UnifiedConsensusMaker.py", line 326, in main() File "./Duplex-Sequencing/UnifiedConsensusMaker.py", line 316, in main plt.xlim(0, max(fam_size_x_axis)) ValueError: max() arg is an empty sequence |

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/loeblab/Duplex-Sequencing/issues/42#issuecomment-235806797, or mute the thread https://github.com/notifications/unsubscribe-auth/AB8PUKjzHXCZDcZrqDx0QV4ILQhW20Pzks5qaEZngaJpZM4JV9Sd.

scottrk commented 4 years ago

Closing a resolved.

Kennedy-Lab-UW / Duplex-Sequencing

UnifiedConsensusMaker.py IOError #42