fulcrumgenomics / fgbio

Tools for working with genomic and high throughput sequencing data.
http://fulcrumgenomics.github.io/fgbio/
MIT License
314 stars 67 forks source link

SamToFastq ERROR #744

Open zhangyafeng1 opened 2 years ago

zhangyafeng1 commented 2 years ago

hello: After I finish CallMolecularConsensusReads, when I use SamToFastq, the following error appears, indicating that there is an illegal mate state: A:10000006/A, could you tell me the reason? thank you very much !

屏幕截图 2021-12-05 195148

Exception in thread "main" picard.PicardException: Illegal mate state: A:10000006/A at picard.sam.SamToFastq.assertPairedMates(SamToFastq.java:376) at picard.sam.SamToFastq.doWork(SamToFastq.java:184) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

zhangyafeng1 commented 2 years ago

This is the case in BAM

image

zhangyafeng1 commented 2 years ago

This is my code , I'm not sure what caused the error,,Thanks for your help !!

Step1: /usr/bin/java fgbio-1.3.0.jar GroupReadsByUmi \ -i merge.extra_umi.bam \ -o group.bam \ -t RX -T MI -m 20 --strategy=paired -e 1 \ -f familysize.txt

Step2: /usr/bin/java -Xmx4g -jar fgbio-1.3.0.jar SortBam \ -i group.bam \ -s TemplateCoordinate \ -o group.sorted.bam

Step3: /usr/bin/java -Xmx4g -jar fgbio-1.3.0.jar CallMolecularConsensusReads \ -i group.sorted.bam \ --error-rate-pre-umi=45 --error-rate-post-umi=30 --min-input-base-quality=30 --min-consensus-base-quality=30\ --min-reads=1 \ -o unmapped.consensus.bam

Step4: /usr/bin/java -Xmx4g -jar picard.jar SamToFastq \ I=unmapped.consensus.bam \ F=consensus.1.fq \ F2=consensus.2.fq

Gregory94 commented 2 years ago

Hi @zhangyafeng1 I'm using a very similar workflow that you show here and I get the same error. Indeed I also find more than two reads having the same name in the output bam file from GroupReadsByUmi. Have you already found a solution? Thanks in advance your help!

nh13 commented 2 years ago

Could you share the input BAM to GroupReadsByUmi for us to reproduce and debug? If it’s too large, try picking the first hundred or so reads and verify that you get the same error. Thank you!

nh13 commented 2 years ago

@zhangyafeng1 sorry this slipped through the cracks and @Gregory94 thanks for any data you could provide to help debug!

Gregory94 commented 2 years ago

Hi @nh13, thanks for your quick reply! I think I already found the solution. CallMolecularConsensusReads requires the bam files to be sorted using SortBam -s TemplateCoordinate (like @zhangyafeng1 shows in the third comment and also as is recommended by CallMolecularConsensusReads when you call this function). But CallMolecularConsensusReads does not, by default, consider the reads sorted by TemplateCoordinate. This you have to explicitly specify using CallMolecularConsensusReads -S TemplateCoordinate. I thought this was the default setting because it is recommended by the tool, hence my confusion. For me the problem is now solved, I hope this also works for @zhangyafeng1

nh13 commented 2 years ago

@Gregory94 in our next release, we're going to be more explicit about the sorts going in and out, so this is good timing.

@zhangyafeng1 can you send a test case?

Rohit-Satyam commented 1 year ago

I am running into similar issue. I am trying to run mitochondrial variant calling step given below:

picard SamToFastq INPUT=sample1_rev_sorted.bam FASTQ=/dev/stdout INTERLEAVE=true NON_PF=true VALIDATION_STRINGENCY=LENIENT |     bwa mem -M -K 100000000 -p -v 3 -t 2 -Y resources/Homo_sapiens_assembly38.chrM.fasta - > sample1_temp.bam

Sorting sam

picard SamToFastq INPUT=sample1_rev_sorted.bam FASTQ=/dev/stdout INTERLEAVE=true NON_PF=true VALIDATION_STRINGENCY=LENIENT |     bwa mem -M -K 100000000 -p -v 3 -t 2 -Y resources/Homo_sapiens_assembly38.chrM.fasta - > sample1_temp.bam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
INFO    2023-02-09 18:55:54     SamToFastq

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
**********
https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    SamToFastq -INPUT sample1_rev_sorted.bam -FASTQ /dev/stdout -INTERLEAVE true -NON_PF true -VALIDATION_STRINGENCY LENIENT
**********

18:55:55.286 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/rohit/miniconda3/envs/cancerpipe/share/picard-2.27.4-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Feb 09 18:55:55 IST 2023] SamToFastq INPUT=sample1_rev_sorted.bam FASTQ=/dev/stdout INTERLEAVE=true INCLUDE_NON_PF_READS=true VALIDATION_STRINGENCY=LENIENT    OUTPUT_PER_RG=false COMPRESS_OUTPUTS_PER_RG=false RG_TAG=PU RE_REVERSE=true CLIPPING_MIN_LENGTH=0 READ1_TRIM=0 READ2_TRIM=0 INCLUDE_NON_PRIMARY_ALIGNMENTS=false VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Feb 09 18:55:55 IST 2023] Executing as rohit@DESKTOP-27J4F4O on Linux 5.15.79.1-microsoft-standard-WSL2 amd64; OpenJDK 64-Bit Server VM 1.8.0_332-b09; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.27.4-SNAPSHOT
[Thu Feb 09 18:55:55 IST 2023] picard.sam.SamToFastq done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=514850816
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: Illegal mate state: M01368:16:000000000-A3M99:1:1112:14644:9150
        at picard.sam.SamToFastq.assertPairedMates(SamToFastq.java:438)
        at picard.sam.SamToFastq.handleRecord(SamToFastq.java:309)
        at picard.sam.SamToFastq.doWork(SamToFastq.java:205)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
[M::process] read 2361 sequences (528144 bp)...
[M::process] 1 single-end sequences; 2360 paired-end sequences
[M::mem_process_seqs] Processed 1 reads in 0.001 CPU sec, 0.002 real sec
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 1180, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (205, 277, 369)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 697)
[M::mem_pestat] mean and std.dev: (291.13, 122.19)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 861)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 2360 reads in 0.349 CPU sec, 0.188 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -M -K 100000000 -p -v 3 -t 2 -Y resources/Homo_sapiens_assembly38.chrM.fasta -
[main] Real time: 2.730 sec; CPU: 0.454 sec
nh13 commented 1 year ago

@Rohit-Satyam can you please open a new issue and provide how you sorted it and a test case?