broadinstitute / Drop-seq

Java tools for analyzing Drop-seq data
MIT License
120 stars 34 forks source link

The output of DigitalExpress is an empty matrix. #190

Closed felixhell2004 closed 4 years ago

felixhell2004 commented 4 years ago

Instructions

Hi,I'm trying to run DigitalExpression step but I get the empty matrix.I have already done the step as cookbook.Such Tag the barcode,UMI,merge bam file and tag exon.These no any error raise during the workflow.And the result of PS:The GEO accession of dataset I used is GSE134355 which created from human cell landscape.

Affected tool(s)

Tool name(s), special parameters? java -Xmx40g -jar ${dropseq_path}/dropseq.jar DigitalExpression \ I=${path}/${sample[@]}_aft_align/merge_exon.bam \ O=${path}/DGE/${sample[@]}_digitalExpr_gene100.dge.txt.gz \ SUMMARY=${path}/DGE/${sample[@]}_digitalExpr_gene100.dge.summary.txt \ GENE_NAME_TAG=QQ \ CELL_BARCODE_TAG=XC \ MOLECULAR_BARCODE_TAG=XM \ MIN_NUM_GENES_PER_CELL=100

Affected version(s)

Description

Describe the problem below. Provide screenshots , stacktrace , logs where appropriate.

--------------------------------------------------

16:47:09.933 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/lustre/home/whzhou/biosoft/Drop-seq_tools-2.3.0/jar/lib/picard-2.18.14.jar!/com/intel/gkl/native/libgkl_compression.so [Mon Jun 08 16:47:09 CST 2020] DigitalExpression SUMMARY=/lustre/home/whzhou/project/6_2_HCL/DGE/SRR9843425_digitalExpr.dge.summary.txt OUTPUT=/lustre/home/whzhou/project/6_2_HCL/DGE/gene100_digitalExpr.dge.txt.gz INPUT=/lustre/home/whzhou/project/6_2_HCL/SRR9843425_aft_align/merge_exon.bam CELL_BARCODE_TAG=XC MOLECULAR_BARCODE_TAG=XM MIN_NUM_GENES_PER_CELL=100 GENE_NAME_TAG=QQ OUTPUT_READS_INSTEAD=false EDIT_DISTANCE=1 READ_MQ=10 MIN_BC_READ_THRESHOLD=0 USE_STRAND_INFO=true RARE_UMI_FILTER_THRESHOLD=0.0 GENE_STRAND_TAG=gs GENE_FUNCTION_TAG=gf STRAND_STRATEGY=SENSE LOCUS_FUNCTION_LIST=[CODING, UTR] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Mon Jun 08 16:47:09 CST 2020] Executing as whzhou@cu294 on Linux 3.10.0-862.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_161-b14; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.3.0(34e6572_1555443285) INFO 2020-06-08 16:47:09 BarcodeListRetrieval Looking for cell barcodes that have at least 100 genes INFO 2020-06-08 16:47:09 CustomBAMIterators Reading in records for TAG name sorting INFO 2020-06-08 16:47:16 CustomBAMIterators Processed 1,000,000 records. Elapsed time: 00:00:06s. Time for last 1,000,000: 6s. Last read position: chr11:46,428,725 INFO 2020-06-08 16:47:22 CustomBAMIterators Processed 2,000,000 records. Elapsed time: 00:00:12s. Time for last 1,000,000: 6s. Last read position: chr12:14,881,912 INFO 2020-06-08 16:47:27 CustomBAMIterators Processed 3,000,000 records. Elapsed time: 00:00:17s. Time for last 1,000,000: 5s. Last read position: chr13:99,294,270 INFO 2020-06-08 16:47:33 CustomBAMIterators Processed 4,000,000 records. Elapsed time: 00:00:23s. Time for last 1,000,000: 5s. Last read position: chr15:44,717,951 INFO 2020-06-08 16:47:38 CustomBAMIterators Processed 5,000,000 records. Elapsed time: 00:00:28s. Time for last 1,000,000: 5s. Last read position: chr16:74,873,833 INFO 2020-06-08 16:47:45 CustomBAMIterators Processed 6,000,000 records. Elapsed time: 00:00:35s. Time for last 1,000,000: 6s. Last read position: chr18:3,256,206 INFO 2020-06-08 16:47:50 CustomBAMIterators Processed 7,000,000 records. Elapsed time: 00:00:40s. Time for last 1,000,000: 5s. Last read position: chr1:632,462 INFO 2020-06-08 16:47:55 CustomBAMIterators Processed 8,000,000 records. Elapsed time: 00:00:45s. Time for last 1,000,000: 5s. Last read position: chr1:145,746,468 INFO 2020-06-08 16:48:00 CustomBAMIterators Processed 9,000,000 records. Elapsed time: 00:00:51s. Time for last 1,000,000: 5s. Last read position: chr20:43,473,774 INFO 2020-06-08 16:48:05 CustomBAMIterators Processed 10,000,000 records. Elapsed time: 00:00:55s. Time for last 1,000,000: 4s. Last read position: chr2:21,002,026 INFO 2020-06-08 16:48:11 CustomBAMIterators Processed 11,000,000 records. Elapsed time: 00:01:01s. Time for last 1,000,000: 5s. Last read position: chr2:101,006,350 INFO 2020-06-08 16:48:16 CustomBAMIterators Processed 12,000,000 records. Elapsed time: 00:01:06s. Time for last 1,000,000: 5s. Last read position: chr3:101,685,831 INFO 2020-06-08 16:48:21 CustomBAMIterators Processed 13,000,000 records. Elapsed time: 00:01:11s. Time for last 1,000,000: 5s. Last read position: chr4:90,838,485 INFO 2020-06-08 16:48:26 CustomBAMIterators Processed 14,000,000 records. Elapsed time: 00:01:16s. Time for last 1,000,000: 4s. Last read position: chr5:79,284,070 INFO 2020-06-08 16:48:32 CustomBAMIterators Processed 15,000,000 records. Elapsed time: 00:01:22s. Time for last 1,000,000: 6s. Last read position: chr6:31,353,940 INFO 2020-06-08 16:48:37 CustomBAMIterators Processed 16,000,000 records. Elapsed time: 00:01:27s. Time for last 1,000,000: 5s. Last read position: chr7:44,799,829 INFO 2020-06-08 16:48:42 CustomBAMIterators Processed 17,000,000 records. Elapsed time: 00:01:32s. Time for last 1,000,000: 4s. Last read position: chr8:100,151,401 INFO 2020-06-08 16:48:47 CustomBAMIterators Processed 18,000,000 records. Elapsed time: 00:01:37s. Time for last 1,000,000: 5s. Last read position: chrM:2,030 INFO 2020-06-08 16:48:52 CustomBAMIterators Processed 19,000,000 records. Elapsed time: 00:01:42s. Time for last 1,000,000: 4s. Last read position: chrM:3,164 INFO 2020-06-08 16:48:58 CustomBAMIterators Processed 20,000,000 records. Elapsed time: 00:01:48s. Time for last 1,000,000: 6s. Last read position: chrM:7,041 INFO 2020-06-08 16:49:03 CustomBAMIterators Processed 21,000,000 records. Elapsed time: 00:01:53s. Time for last 1,000,000: 4s. Last read position: chrM:8,587 INFO 2020-06-08 16:49:07 CustomBAMIterators Processed 22,000,000 records. Elapsed time: 00:01:58s. Time for last 1,000,000: 4s. Last read position: chrM:9,135 INFO 2020-06-08 16:49:12 CustomBAMIterators Processed 23,000,000 records. Elapsed time: 00:02:02s. Time for last 1,000,000: 4s. Last read position: chrM:9,922 INFO 2020-06-08 16:49:17 CustomBAMIterators Processed 24,000,000 records. Elapsed time: 00:02:07s. Time for last 1,000,000: 4s. Last read position: chrM:15,813 INFO 2020-06-08 16:49:22 CustomBAMIterators Processed 25,000,000 records. Elapsed time: 00:02:12s. Time for last 1,000,000: 4s. Last read position: / ... INFO 2020-06-08 17:04:31 BamTagOfTagCounts Processed 189,000,000 records. Elapsed time: 00:04:27s. Time for last 1,000,000: 1s. Last read position: / INFO 2020-06-08 17:04:32 DigitalExpression Calculating digital expression for [2591] cells. INFO 2020-06-08 17:10:53 UMIIterator Sorting finished. [Mon Jun 08 17:10:53 CST 2020] org.broadinstitute.dropseqrna.barnyard.DigitalExpression done. Elapsed time: 23.73 minutes. Runtime.totalMemory()=11868831744

1

Steps to reproduce

Tell us how to reproduce this issue. If possible, include command lines that reproduce the problem and provide a minimal test case.

Expected behavior

Tell us what should happen All things seem to be right except the output.I have no idea about it .Thanks in advance for your help.

Actual behavior

Tell us what happens instead

felixhell2004 commented 4 years ago

In case of the picture disappear.

METRICS CLASS org.broadinstitute.dropseqrna.barnyard.DigitalExpression$DESummary

CELL_BARCODE NUM_GENIC_READS NUM_TRANSCRIPTS NUM_GENES TGTGCGGGGCGAGTCGGT 0 0 0 ACACCCTAGTCGTTCCGC 0 0 0 GAATTAGAATTACCGCTA 0 0 0 GTCCCGATACAGTATGTA 0 0 0 ACCTGACACAAGCAACAA 0 0 0 TGAAGCACAATATATGTA 0 0 0 GTCCCGGCCTAGATTTGC 0 0 0 CAAAGTCAAAGTATCAAC 0 0 0 TATGTAATGCTTGTCCCG 0 0 0 CATCCCTCACTTAGTCGT 0 0 0 CGCTTGCTTCTGGCGTCC 0 0 0 CTGTGTGCGTCCAGCGAG 0 0 0 ATGGCGACGTTGGTCGGT 0 0 0

-----------------------------------

The output of the dge.summary is like this

alecw commented 4 years ago

Hi @felixhell2004 ,

Can you send 2 things?

I'm wondering if there is a problem with the gene tagging of this file.

-Alec

felixhell2004 commented 4 years ago

Hi Alec, @alecw Thanks for your reply.Here it is the command line I used to tag the BAM file.

---------------------------Tag the barcode-----------------------

java -jar -Xmx30g ${dropseq_path}/dropseq.jar TagBamWithReadSequenceExtended \ INPUT=${path}/${sample[@]}_raw.bam \ OUTPUT=${path}/${sample[@]}_unaligned_tagged_barcode.bam \ SUMMARY=unaligned_tagged_barcodes.bam_summary.txt \ BASE_RANGE=1-6:22-27:43-48 \ BASE_QUALITY=10 \ BARCODED_READ=1 \ DISCARD_READ=False \ TAG_NAME=XC \ NUM_BASES_BELOW_QUALITY=1

----------------------------Tag the UMI---------------------------

java -jar -Xmx50g ${dropseq_path}/dropseq.jar TagBamWithReadSequenceExtended \ INPUT=${path}/${sample[@]}_unaligned_tagged_barcode.bam \ OUTPUT=${path}/${sample[@]}_unaligned_tagged_UMI.bam \ SUMMARY=unaligned_tagged_UMI.bam_summary.txt \ BASE_RANGE=49-54 \ BASE_QUALITY=10 \ BARCODED_READ=1 \ DISCARD_READ=False \ TAG_NAME=XM \ NUM_BASES_BELOW_QUALITY=1

-----------------------------Tag the exon---------------------------

java -jar -Xmx18g ${dropseq_path}/dropseq.jar TagReadWithGeneExonFunction \ I=${path}/SRR9843425_aft_align/merged.bam \ O=${path}/SRR9843425_aft_align/merge_exon.bam \ ANNOTATIONS_FILE=${refFlat} \ TAG=QQ

---------------------the view of the samtools view clean.bam |head---------------

SRR9843425.sra.29850958 163 chr10 37865 255 150M = 295551 257756 TTAATAATGGGAGACTTTAACACCCCACTGTCAACATTAGACAGATCAACGAGACAGAAAGTTAACAAGGATACCCAGGAATTGAACTCAGCTCTGCACTAAGCGGACCTAATAGACATCTACAGAACTCTCCACCCCAAATCAACAGAA AAAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFJJJJJJJJJJJJJJJFFJJFJFJFJJJJJJAFJJFFJJJJFFJJFJJFFJJJJJJJJJJJJAFJJJJAJA<FFJJ7JFF-AAFAJ7AAF<FJJJJAF7FFJJ)7AFFJAAJF7J MC:Z:27S70M53SXC:Z:GGTACAACACCCTCACTT MD:Z:44c5a11c36c4a45 XF:Z:INTERGENIC PG:Z:STAR RG:Z:A NH:i:1 NM:i:5 XM:Z:GCTCCC MQ:i:255 UQ:i:205 AS:i:198 SRR9843425.sra.77654484 163 chr10 38071 255 150M = 295560 257552 GAAGTAAAGCTCTCCTCAGCAAATGTAAAAGAACAGAAATTATAACAAACTATCTCTCAGACCACAGTGCAATCAAACTAGAACTCAGGATTAAGAATCTCACTCAAAGCCGCTCAACTACATGGAAACTGAACAACCTGCTCCTGAATG AAFAFJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJFJJJJF<FJFJFJJFJFJJAFFJFFJFAJJJJJJJJJJJJAFJJJJJJJ-AJFFFFFJJFFJJJJAAJJFFJJJF-<))-AAFAF< MC:Z:2S63M85SXC:Z:GAATTACGCACCAGTCGT MD:Z:94c13a2a38 XF:Z:INTERGENIC PG:Z:STAR RG:Z:A NH:i:1 NM:i:3 XM:Z:CGAACG MQ:i:255 UQ:i:123 AS:i:201 SRR9843425.sra.11765170 163 chr10 38077 255 150M = 295561 257545 AAGCTCTCCTCAGCAAATGTAAAAGAACAGAAATTATAACAAACTATCTCTCAGACCACAGTGCAATCAAACTAGAACTCAGGATTAAGAATCTCACTCAAAGCCGCTCAACTACATGGAAACTGAACAACCTGCTCCTGAATGACTACT AAFFFFJJFJJJJJJJFJFJJJJJJFFJJJFJJJJJJJJJJJJJJJJF7FFJJJJJJJJJJJJJJJJFJFJJJJJFJJJFJJJJFJJJJJJFFJFJ<AJJJJJJJFJAFJJFFJFJ7FF<JJAAAAFJJJJJ<FFJFFJAJJF<<FFJF< MC:Z:37S61M52SXC:Z:GCCCTCTCGTAACGTGGC MD:Z:88c13a2a44 XF:Z:INTERGENIC PG:Z:STAR RG:Z:A NH:i:1 NM:i:3 XM:Z:TTTATC MQ:i:255 UQ:i:119 AS:i:201 SRR9843425.sra.28853201 163 chr10 38290 255 150M = 295560 257335 CAACATACCAGAATCTCTGGGACGTATTCAAAGCAGTGTGTAGAGGGAAATTTATAGCACTAAATGCCCACAAGAGAAAGCAGGAAAGGTCCAAAATTGACACCCTAGCATCACAATTAAAAGAACTAGAAAAGCAAGAGCAAACACATT AAAFAFFAA-77-7-A<AJ<F-7FF-F-<F7FJJFJA7J--7JF-A-FJ-J<77FJ-F7<AJ<A7JJ-7-FA<AJAJFJ-FJJJ<F-7-AFFJA-AAAF777<-JF<FJF7FF-AAFJJF-F<-<A-FFA---F<F---7-7FAA-<J<< MC:Z:34S65M51SXC:Z:GCAGGAGCCTAGCGGCAG MD:Z:24c29c33a18a42 XF:Z:INTERGENIC PG:Z:STAR RG:Z:A NH:i:1 NM:i:4 XM:Z:AAACCC MQ:i:255 UQ:i:123 AS:i:199 SRR9843425.sra.86314671 163 chr10 38292 255 2S148M = 295563 257330 CCACATACCAGAATCTCTGGGACGCATTCAAAGCAGTGTGTAGAGGGAAATTTATAGCACTAAATGCCTACAAGAGAAAGCAGGAAAGATCCAAAATTGACACCCTAACATCACAATTAAAAGAACTAGAAAAGCAAGAGCAAACACATT AAFFFJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJFJJJJJJJJJJFJJJJJJJJJAFJJJFFJJFJJJFFJJJJJJJJJJJJJJJ7JFJ<AAJJJFJFJJJFFFJJFJFA<FF-JJJJJFJJJJFJFF7-FF-<-<<JJ<FFFA MC:Z:39S59M52SXC:Z:GCCCTCTCGTAACGTGGC MD:Z:52c13c81 XF:Z:INTERGENIC PG:Z:STAR RG:Z:A NH:i:1 NM:i:2 XM:Z:TTTATC MQ:i:255 UQ:i:82 AS:i:199

-----------------the view of the QQ(which the name I tag to the exon)---------------------

QQ

Regards,Felix

jamesnemesh commented 4 years ago

It looks like from here:

You’re using a recent version of the software (the GENE_STRAND_TAG=gs is a give-away).

TagReadWithGeneExonFunction is for legacy use only. See the section of the cookbook: “Updates to TagReadWithGeneExon (V2)”. You want to use TagReadWithGeneFunction instead. Tag you bam with that, and then try digital expression again.

I’d also suggest using the default tag names, it’ll make your life a lot easier.

-Jim

On Jun 8, 2020, at 9:19 PM, felixhell2004 notifications@github.com wrote:

java -jar -Xmx18g ${dropseq_path}/dropseq.jar TagReadWithGeneExonFunction I=${path}/SRR9843425_aft_align/merged.bam O=${path}/SRR9843425_aft_align/merge_exon.bam ANNOTATIONS_FILE=${refFlat} TAG=QQ

felixhell2004 commented 4 years ago

Dear Jim: It is exactly the matter of legacy use.It had been right when I change to use TagReadWithGeneFunction.And the result of the summary is normal now.

--------------------------------------------------------

htsjdk.samtools.metrics.StringHeader

Started on: Tue Jun 09 13:40:25 CST 2020

METRICS CLASS org.broadinstitute.dropseqrna.barnyard.DigitalExpression$DESummary

CELL_BARCODE NUM_GENIC_READS NUM_TRANSCRIPTS NUM_GENES TGATCAGCAGGATAGAGA 5418 1950 971 CTGTGTATCAACTTCCGC 5217 1644 748 AACCTAGCCCTCAGGACT 4681 1631 747 GAACGCTTAACTGAATTA 4422 1531 739 ATCAACATGCTTATCAAC 4600 1470 879 CCATCTCTGAAAGCTGTG 4011 1400 728 ATGGCGTCACTTTCTACC 3668 1363 682 ATTCCAATGCTTATGGCG 3664 1224 713 CGCTTGGTCCCGCCGCTA 3098 1199 611 AGCGAGCTGTGTAGGACT 3157 1172 551

But I can't download the cookbook v2.0 from github.I wonder is my ID problems or anything else. undisplayed

At last,thank u Jim,thank u Alec. Regards, Felix

ysu2015 commented 4 years ago

Hi, I have some issue with the "DigitalExpression" function with "Cell_BC_File" as well. DigitalExpression output good result with "NUM_CORE_BARCODES", but when I try to use the "Cell_BC_File" option, the output file only have the cell barcodes, but not any count numbers, even not 0. I used the command below to generate the cell barcode file to make sure it fit the argument format. write.table(cell.barcode, "cell.barcode.txt", row.names=F, col.names=F) Any hints what could get wrong? Thank you so much in advance.

Y

jamesnemesh commented 4 years ago

It’s really hard to guess without seeing any of your data, but I’m gonna guess your barcodes have quotes around them.

IE: “ABC” instead of ABC.

The barcodes need to exactly match your data.

Try this argument in R:

quote
a logical value (TRUE or FALSE) or a numeric vector. If TRUE, any character or factor columns will be surrounded by double quotes. If a numeric vector, its elements are taken as the indices of columns to quote. In both cases, row and column names are quoted if they are written. If FALSE, nothing is quoted.

-Jim

On Sep 15, 2020, at 2:05 PM, ysu2015 notifications@github.com wrote:

Hi, I have some issue with the "DigitalExpression" function with "Cell_BC_File" as well. DigitalExpression output good result with "NUM_CORE_BARCODES", but when I try to use the "Cell_BC_File" option, the output file only have the cell barcodes, but not any count numbers, even not 0. I used the command below to generate the cell barcode file to make sure it fit the argument format. write.table(cell.barcode, "cell.barcode.txt", row.names=F, col.names=F) Any hints what could get wrong? Thank you so much in advance.

Y

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/Drop-seq/issues/190#issuecomment-692882825, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCZXJYY5ML4QUJ6YIISCN3SF6UHBANCNFSM4NYGRPMA.

ysu2015 commented 4 years ago

Great catch, jamesnemesh!!! I do have "" around my barcodes. I will try to do quote =F. Thank you so much.

Y

alecw commented 4 years ago

Hi @jurtlest ,

Please create a new issue in order to ask this question.

Regards, Alec

Prakrithi-P commented 10 months ago

Hi. I am getting an empty output as well.

DigitalExpression -I TAR_tagged_withDir_ed.bam -O TAR_expression_matrix_withDir.txt.gz -TMP_DIR /scratch/user/s4716765/uTARs/tmp -CELL_BARCODE_TAG CB -MOLECULAR_BARCODE_TAG XM -GENE_NAME_TAG GN -GENE_STRAND_TAG GS -CELL_BC_FILE ../outs/raw_feature_bc_matrix/barcodes.tsv.gz


21:10:05.877 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/s4716765/.conda/envs/scTAR_cellranger/share/dropseq_tools-2.4.0-1/jar/lib/picard-2.20.5.jar!/com/intel/gkl/native/libgkl_compression.so [Wed Jan 10 21:10:05 AEST 2024] DigitalExpression OUTPUT=TAR_expression_matrix_withDir.txt.gz INPUT=TAR_tagged_withDir_ed.bam CELL_BARCODE_TAG=CB MOLECULAR_BARCODE_TAG=UB CELL_BC_FILE=../outs/raw_feature_bc_matrix/barcodes.tsv.gz GENE_NAME_TAG=GN GENE_STRAND_TAG=GS TMP_DIR=[/scratch/user/s4716765/uTARs/tmp] OUTPUT_READS_INSTEAD=false OMIT_MISSING_CELLS=false EDIT_DISTANCE=1 READ_MQ=10 MIN_BC_READ_THRESHOLD=0 USE_STRAND_INFO=true RARE_UMI_FILTER_THRESHOLD=0.0 GENE_FUNCTION_TAG=gf STRAND_STRATEGY=SENSE LOCUS_FUNCTION_LIST=[CODING, UTR] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Wed Jan 10 21:10:05 AEST 2024] Executing as s4716765@bunya3.rcc.uq.edu.au on Linux 4.18.0-477.27.1.el8_8.x86_64 amd64; OpenJDK 64-Bit Server VM 10.0.2+13; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.4.0(3d2b3d8_1600201514) INFO 2024-01-10 21:10:05 BarcodeListRetrieval Found 4992 cell barcodes in file INFO 2024-01-10 21:10:05 DigitalExpression Calculating digital expression for [4992] cells. INFO 2024-01-10 21:10:24 UMIIterator Sorting finished. [Wed Jan 10 21:10:24 AEST 2024] org.broadinstitute.dropseqrna.barnyard.DigitalExpression done. Elapsed time: 0.31 minutes. Runtime.totalMemory()=2677538816

The BAM file was generated with SMRTlink tools and the UMI information is in XM tag which I have specified as MOLECULAR_BARCODE_TAG but it seems to be an issue. Even when I used it by renaming to UB, it doesnt work. Can someone please help me with this?

Thanks, Prakrithi

alecw commented 10 months ago

Hi Prakrithi,

Can you create a new issue to report this problem? In that issue, in addition to all the other things requested when creating an issue, please include the following:

Regards, Alec

jamesnemesh commented 10 months ago

I would also re-check that the BAM tag arguments you're using are correct.
-CELL_BARCODE_TAG CB -MOLECULAR_BARCODE_TAG XM -GENE_NAME_TAG GN -GENE_STRAND_TAG GS

The cellbarcode tag is the CellRanger standard, the molecular tag is the DropSeq standard, the gene name and gene strand don't match the TagReadWithGeneFunction standards, so either you decided to override the default arguments for some reason or you have the wrong arguments there. I would look at your input bam and see what tags you actually use, and as @alecw said post that here. You could add some additional filtering to your samtools view to specifically look for reads with GS: GN: CB: and XM:. If you can't find any reads that contain all of those tags, it's no surprise that the program couldn't either.