aristoteleo / dynast-release

Inclusive and efficient quantification of labeling and splicing RNAs for time-resolved metabolic labeling based scRNA-seq experiments
https://dynast-release.readthedocs.io/en/latest/
MIT License
15 stars 4 forks source link

Dynast count BamError: Some paired reads do not have mates #4

Closed julianalbers closed 2 years ago

julianalbers commented 2 years ago

Hi,

First of all, thank you for the cool tool! I have a some troubles to get the dynast count function to work. I keep getting the following error:

ERROR [main] An exception occurred Traceback (most recent call last): File "/broad/hptmp/jalbers/dynast_env/lib/python3.7/site-packages/dynast/main.py", line 741, in main COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=args.tmp) File "/broad/hptmp/jalbers/dynast_env/lib/python3.7/site-packages/dynast/main.py", line 582, in parse_count velocity=not args.no_splicing, File "/broad/hptmp/jalbers/dynast_env/lib/python3.7/site-packages/ngs_tools/logging.py", line 62, in inner return func(*args, **kwargs) File "/broad/hptmp/jalbers/dynast_env/lib/python3.7/site-packages/dynast/count.py", line 95, in count velocity=velocity File "/broad/hptmp/jalbers/dynast_env/lib/python3.7/site-packages/dynast/preprocessing/bam.py", line 557, in parse_all_reads n_threads=n_threads, File "/broad/hptmp/jalbers/dynast_env/lib/python3.7/site-packages/ngs_tools/bam.py", line 213, in split_bam raise BamError('Some paired reads do not have mates.') ngs_tools.bam.BamError: Some paired reads do not have mates.

The error occurs when running the following code: dynast count --tmp tmp_count_1\ --keep-tmp\ --verbose\ -t 8\ --barcode-tag RG\ --barcodes align/Solo.out/Gene/filtered/barcodes.tsv\‚ -o COUNT/\ --barcodes ALIGN/Solo.out/Gene/filtered/barcodes.tsv\ -g gencode.v38.annotation.gtf.gz\ --conversion TC ALIGN/Aligned.sortedByCoord.out.bam

Some background for these data: I performed mini-bulk RNAseq with SmartSeq2, performed the library prep using Nextera and sequenced the samples on a Nextseq500. The BCL file that Nextseq generates was demultiplexed and converted to fastq files using bcl2fastq and I used dynast align to generate the BCL file. The BCL file contains the reads from two different samples (bulk).

This is the code I used to run dynast align: dynast align\ --verbose\ --keep-tmp\ --tmp new_tmp6\ -i $STAR_DIR\ -o ALIGN/\ -t 8\ -x smartseq samples_ls_1.csv\ -w all_barcodes_SS2_whitelist.txt

The file all_barcodes_SS2_whitelist.txt is empty.

I've done the following troubleshooting so far, but did not get it to run:

Thanks for your help! Julian

julianalbers commented 2 years ago

It worked after filtering the dynast align output BAM-file with samtools view -f 0x2 -b -o out.bam in.bam. Can you explain to me, please, why this is the case, and the initial BAM file did not work? Thanks!

Lioscro commented 2 years ago

Hi, @julianalbers, Thanks for the detailed description and finding a solution! It really helps me figure out what is happening here. Since I don't have your BAM file, it's hard to tell exactly why filtering for the 0x2 flag (which indicates properly paired reads), my suspicion is that the problem is coming from paired reads that either both align to the same strand (forward/forward, reverse/reverse), are mapped to far from one another, or a single pair is unmapped. Either of these cases would set the 0x1 flag (indicating paired alignment) but not the 0x2 flag.

It seems we should always be considering properly paired reads anyway, so I will add an additional filtering step prior to splitting the BAM.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days