Hoohm / CITE-seq-Count

A tool that allows to get UMI counts from a single cell protein assay
https://hoohm.github.io/CITE-seq-Count/
MIT License
79 stars 44 forks source link

unmapped tags #143

Open leandrofmoreno opened 3 years ago

leandrofmoreno commented 3 years ago

Hi, I am starting with CITE_seq-Count and its output. The Tool looks amazing.

After running the script I got an output that is very populated by unmapped TAGs. I can recover few cells mapped to my tags but the vast majority of cells are classified as unmmaped

Screen Shot 2021-01-19 at 16 23 08

I used the following code to run: CITE-seq-Count -R1 HT-d1_S2_L001_R1_001.fastq.gz -R2 HT-d1_S2_L001_R2_001.fastq.gz -t TAG_LIST.csv -wl filtered_feature_bc_matrix/barcodes.tsv -cbf 1 -cbl 16 -umif 17 -umil 26 -cells 6000 -o output_1

My TAG file looks like this: ACCCACCAGTAAGAC,Proximal GGTCGAGAGCATTCA,Distal CTTGCCGCATGTCAT,Colon

my whitelist is the barcode file generated by the facility;

and my run report indicate a good amount mapped reads: Running time: 40.0 minutes, 39.09 seconds CITE-seq-Count Version: 1.4.4 Reads processed: 32587727 Percentage mapped: 96 Percentage unmapped: 4 Uncorrected cells: 223

Is there anything that I am doing wrong or step that I am missing?

Thank you. I really appreciate your any help

Leandro

Hoohm commented 3 years ago

Hello Leandro,

I would not use the unmapped values to do hashing attribution. The other problem might be linked to the chemistry you are using. Is it 10xv3? If it is, you should try out the new release that is coming soon that will deal with the multi cell barcode per cell issue.

On Tue, 19 Jan 2021 at 16:26, leandrofmoreno notifications@github.com wrote:

Hi, I am starting with CITE_seq-Count and its output. The Tool looks amazing.

After running the script I got an output that is very populated by unmapped TAGs. I can recover few cells mapped to my tags but the vast majority of cells are classified as unmmaped

[image: Screen Shot 2021-01-19 at 16 23 08] https://user-images.githubusercontent.com/25199693/105054986-e3d39d00-5a72-11eb-83e6-495b8fc46569.png

I used the following code to run: CITE-seq-Count -R1 HT-d1_S2_L001_R1_001.fastq.gz -R2 HT-d1_S2_L001_R2_001.fastq.gz -t TAG_LIST.csv -wl filtered_feature_bc_matrix/barcodes.tsv -cbf 1 -cbl 16 -umif 17 -umil 26 -cells 6000 -o output_1

My TAG file looks like this: ACCCACCAGTAAGAC,Proximal GGTCGAGAGCATTCA,Distal CTTGCCGCATGTCAT,Colon

my whitelist is the barcode file generated by the facility;

and my run report indicate a good amount mapped reads: Running time: 40.0 minutes, 39.09 seconds CITE-seq-Count Version: 1.4.4 Reads processed: 32587727 Percentage mapped: 96 Percentage unmapped: 4 Uncorrected cells: 223

Is there anything that I am doing wrong or step that I am missing?

Thank you. I really appreciate your any help

Leandro

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Hoohm/CITE-seq-Count/issues/143, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJVO2GBDND3DHBRFQ7SU3DS2WQB7ANCNFSM4WI6BC4Q .

--

Roelli Patrick Division of Animal Physiology and Immunology TUM School of Life Sciences Weihenstephan Technische Universität München Weihenstephaner Berg 3 85354 Freising Germany

https://github.com/Hoohm https://github.com/Hoohm

leandrofmoreno commented 3 years ago

Thank you for replying Patrick.

What do you mean by hashing attribution? Is it probably a real low proportion of proximal, distal and colon cells compared to unmapped?

I noticed that my R1 reads are long; ~151pb and the variable part is the first 15pb. I tried to -trim 16 but I got when HTODemux (Cells with zero counts exist as a cluster). I also spoke with the facility and they suggested to trim at 28pb. It does not solve the issue with the clusters.

thank you one more time,

Hoohm commented 3 years ago

I would subset the umi count matrix and delete the "unmapped" feature.

The main reason for this is that HTO_1, HTO_2, etc... are sample identifiers whereas the unmapped feature is a quality metric.

So identifying your cells as an "unmapped" cluster doesn't make sense.

On top of that, the unmapped "counts" are going to make your other counts look way smaller when the data is normalized, thus you might lose the little real signal you have.

leandrofmoreno commented 3 years ago

Thanks a lot Patrick. It makes sense now.

l.-

lizzie619 commented 3 years ago

Hi. I am having a similar problem where my unmapped values in Seurat are calling too many cells doublets. So the fix is just to remove the unmapped row? Additionally I only have 7% mapped from CITE-Seq-count. Is there a fix to this? The reads appear to be aligned so it doesn't seem like a reading frame issue. Thank you!!

Hoohm commented 3 years ago

I would need more details to understand why the mapping rate is so low. can you share some reports, tags.csv and a few reads?

On Wed, 27 Oct 2021 at 19:09, lizzie619 @.***> wrote:

Hi. I am having a similar problem where my unmapped values in Seurat are calling too many cells doublets. So the fix is just to remove the unmapped row? Additionally I only have 7% mapped from CITE-Seq-count. Is there a fix to this? The reads appear to be aligned so it doesn't seem like a reading frame issue. Thank you!!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Hoohm/CITE-seq-Count/issues/143#issuecomment-953131940, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJVO2EVV2FX4JDV5K7AGF3UJAW27ANCNFSM4WI6BC4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

Roelli Patrick Division of Animal Physiology and Immunology TUM School of Life Sciences Weihenstephan Technische Universität München Weihenstephaner Berg 3 85354 Freising Germany

https://github.com/Hoohm https://github.com/Hoohm

lizzie619 commented 3 years ago

Thank you so much for your reply. I have attached the slurm input file, tags.csv, run report, unmapped csv file, and the umi output for the HTO Cite-Seq run.

Let me know if you need any more information or if you find anything out!!

Thanks, Lizzie Godschall

On Wed, Nov 3, 2021 at 5:14 AM Patrick Roelli @.***> wrote:

I would need more details to understand why the mapping rate is so low. can you share some reports, tags.csv and a few reads?

On Wed, 27 Oct 2021 at 19:09, lizzie619 @.***> wrote:

Hi. I am having a similar problem where my unmapped values in Seurat are calling too many cells doublets. So the fix is just to remove the unmapped row? Additionally I only have 7% mapped from CITE-Seq-count. Is there a fix to this? The reads appear to be aligned so it doesn't seem like a reading frame issue. Thank you!!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/Hoohm/CITE-seq-Count/issues/143#issuecomment-953131940 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAJVO2EVV2FX4JDV5K7AGF3UJAW27ANCNFSM4WI6BC4Q

. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

--

Roelli Patrick Division of Animal Physiology and Immunology TUM School of Life Sciences Weihenstephan Technische Universität München Weihenstephaner Berg 3 85354 Freising Germany

https://github.com/Hoohm https://github.com/Hoohm

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Hoohm/CITE-seq-Count/issues/143#issuecomment-958767816, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWHVDKP45PKSWTRYXI5RZWLUKD4QPANCNFSM4WI6BC4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

lizzie619 commented 3 years ago

Good evening,

I was wondering if you found anything to optimize the unmapped reads.

I really appreciate your help and time!!

Lizzie

On Wed, Nov 3, 2021 at 10:48 AM Elizabeth Godschall @.***> wrote:

Thank you so much for your reply. I have attached the slurm input file, tags.csv, run report, unmapped csv file, and the umi output for the HTO Cite-Seq run.

Let me know if you need any more information or if you find anything out!!

Thanks, Lizzie Godschall

On Wed, Nov 3, 2021 at 5:14 AM Patrick Roelli @.***> wrote:

I would need more details to understand why the mapping rate is so low. can you share some reports, tags.csv and a few reads?

On Wed, 27 Oct 2021 at 19:09, lizzie619 @.***> wrote:

Hi. I am having a similar problem where my unmapped values in Seurat are calling too many cells doublets. So the fix is just to remove the unmapped row? Additionally I only have 7% mapped from CITE-Seq-count. Is there a fix to this? The reads appear to be aligned so it doesn't seem like a reading frame issue. Thank you!!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/Hoohm/CITE-seq-Count/issues/143#issuecomment-953131940 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAJVO2EVV2FX4JDV5K7AGF3UJAW27ANCNFSM4WI6BC4Q

. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

--

Roelli Patrick Division of Animal Physiology and Immunology TUM School of Life Sciences Weihenstephan Technische Universität München Weihenstephaner Berg 3 85354 Freising Germany

https://github.com/Hoohm https://github.com/Hoohm

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Hoohm/CITE-seq-Count/issues/143#issuecomment-958767816, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWHVDKP45PKSWTRYXI5RZWLUKD4QPANCNFSM4WI6BC4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Hoohm commented 3 years ago

Hello @lizzie619 I have not received any files since last time. Maybe they didn't get attached properly

lizzie619 commented 3 years ago

Did the files show up this time?

---------- Forwarded message --------- From: Elizabeth Godschall @.> Date: Wed, Nov 3, 2021 at 10:48 AM Subject: Re: [Hoohm/CITE-seq-Count] unmapped tags (#143) To: Hoohm/CITE-seq-Count < @.>

Thank you so much for your reply. I have attached the slurm input file, tags.csv, run report, unmapped csv file, and the umi output for the HTO Cite-Seq run.

Let me know if you need any more information or if you find anything out!!

Thanks, Lizzie Godschall

On Wed, Nov 3, 2021 at 5:14 AM Patrick Roelli @.***> wrote:

I would need more details to understand why the mapping rate is so low. can you share some reports, tags.csv and a few reads?

On Wed, 27 Oct 2021 at 19:09, lizzie619 @.***> wrote:

Hi. I am having a similar problem where my unmapped values in Seurat are calling too many cells doublets. So the fix is just to remove the unmapped row? Additionally I only have 7% mapped from CITE-Seq-count. Is there a fix to this? The reads appear to be aligned so it doesn't seem like a reading frame issue. Thank you!!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/Hoohm/CITE-seq-Count/issues/143#issuecomment-953131940 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAJVO2EVV2FX4JDV5K7AGF3UJAW27ANCNFSM4WI6BC4Q

. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

--

Roelli Patrick Division of Animal Physiology and Immunology TUM School of Life Sciences Weihenstephan Technische Universität München Weihenstephaner Berg 3 85354 Freising Germany

https://github.com/Hoohm https://github.com/Hoohm

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Hoohm/CITE-seq-Count/issues/143#issuecomment-958767816, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWHVDKP45PKSWTRYXI5RZWLUKD4QPANCNFSM4WI6BC4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Hoohm commented 3 years ago

Could you try to send them directly to my email? patrick.roelli@gmail.com