Closed RM-SCB closed 5 years ago
If that helps, here is a link to R1 and R2 files of the HTO library https://www.dropbox.com/s/bw2dwuoajba6yzn/R1_R2.zip?dl=0 and a link to the tags and barcode whitelist https://www.dropbox.com/s/8ff392heznmna58/tags%20and%20barcodes.zip?dl=0
Hello @r-mvl I'm having the same issue as yours. At first, I thought it was because of too many UMIs to correct but that is probably not it since this dataset is pretty small.
I have to run more tests to find out what it is. I'm still on holiday right now but I'll be back at the office on Monday, should find a fix for next week.
In the meantime, if you need it to run ASAP, you can turn off the umi correction on the develop branch.
Many thanks @Hoohm . No worries, I will try using the develop branch for the weekend then!
Just out of curiosity, have you ran cellranger3 using their feature barcoding quantification and compared results with cite-seq-count? Are results similar?
I get the same issue with the develop branch, sadly, it didn't fix it :(
I haven't tried it because I got this issue with the 10x data I'm testing right now. I will of course compare cellranger results as soon as this is fixed.
@r-mvl Thanks a lot for you dataset, it was very helpful. A small number of reads yet still having this issue. I have some news. I'm using umi_tools for umi correction and it seems that having a really high number of umis for a TAG is the issue. As an illustration, here is the sums of umi for each cells in your dataset (without correcting those with more than 20'000) 32 cells have more than 20'000 umis (~1%). Any idea why that would be?
The quick and easy "fix" would be to just flag the aberrant values and not correct them. Maybe I'll delete them from the normal output and create a separate output for those if requested.
Thank you for your reply. Interestingly, I ran cellranger 3 using feature barcoding quantification and it flags these as well. See the example of the output
We don't know why this is. I also ran cellranger3 on a previous cell hashing experiments which has returned a similar problem. 10x gives some more info here: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/algorithms/antibody So apparently this is something that might happen sometimes (frequently...?)
If I understand correctly, 70% of the reads of used for these (~32) cells in the HTO library?
If so, do we need to sequence this library deeper to get better coverage? I would have thought that still with the remaining 30% I would be able to assign cells to their initial populations using HTODemux, but it fails for 2/3 of the cells due to the very low counts.
As for CITE-seq-count, might be worth adding a filtering step to deal with this prior starting umi_tools?
You can test out the latest develop branch. It will discard offending cells and not try to correct them. Let me know if you get a better results for HTOdemux
Also, I would not advise sequencing deeper, you might have the same issues down the line. I would focus on getting rid of those "aggregates" of cells.
Thanks for this. I managed to run it and it works. After HTOdemux get a similar (but slightly better) number of cells assigned than using the cellranger3 output. I do recover a few more (~3900 using cite-seq-count instead of ~3500). But there is still >4000 cells that get classfied as negative, not so surprising since the counts are so low. At this point we have no chance of re-running the experiment. 3500 is a decent cell number to work with for our purpose, but obviously, it's frustrating to discard half of the data. If currently 70% of the reads are "discarded", it's like if we had spiked the cDNA library using ~1% of HTO library... instead of the 5% originally planned. This is why I thought the only action we could take with these sample is possibly to sequence deeper the HTO lib. Although there is no guarantee that it will improve the data... but is there another option? or you feel it is absolutely pointless?
Happy to hear that you get a bit more cells tagged properly.
I'm confused as to how many cells you expect from CITE-seq-Count since you provide a whitelist of ~4000 cells. 3900 cells seem reasonable. Which cells are the 4000 missing?
Sorry for the confusion. I actually have 2 times 4000 cells (two 10x channels) in the sequencing run. When I said 3900, that's after pooling the 2 libraries. I recover <2000 in each...
Renaud Mével Cancer Research UK | Manchester Institute
E-mail : renaud.mevel@postgrad.manchester.ac.ukmailto:renaud.mevel@gmail.com
E-mail : renaud.mevel@cruk.manchester.ac.ukmailto:renaud.mevel@gmail.com
Phone : +44 (0)7842 701 729<tel:%2B33%20%280%296%2038%2093%2016%2030> (UK)
De : Patrick Roelli notifications@github.com Envoyé : mardi 19 février 2019 08:34 À : Hoohm/CITE-seq-Count Cc: Renaud Mvl; Mention Objet : Re: [Hoohm/CITE-seq-Count] Job never finishes (#39)
Happy to hear that you get a bit more cells tagged properly.
I'm confused as to why how many cells you expect from CITE-seq-Count since you provide a whitelist of ~4000 cells. 3900 cells seem reasonable. Which cells are the 4000 missing?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Hoohm/CITE-seq-Count/issues/39#issuecomment-465036427, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATdQte8FfeG3xfGEY5pPEwfKiWZDFwqtks5vO7aVgaJpZM4a-89A.
Maybe try to run it without the whitelist and ask for ~8000 cells. Maybe you get a few more that you're missing.
I've pushed a new update on the develop branch. Can you pull and try again? Cell barcode correction should run now properly and not throw an error. I got a few more umis on your data. Not sure it will be enough to help you out though.
Thanks. I’ve just tried these (w and w/o whitelist) using the updated develop branch. It changes a bit the figures but not by much. The good thing is that it doesn’t give an error.
I'm getting close to a final version for 1.4.2 Can you try it one more time? This time, with a whitelist because it takes advantage of it.
I've re-run it and it seems fine, no errors.
I've attached a summary of the number of cells I get before and after the last 2 updates (when it still gave an error but ran until the end), using HTODemux with different cut-offs for the positive quantile (after filtering out low quality cells).
Just to be sure because the left and right have the same title. On the left is before the latest patches and the right is the latest version?
Yes sorry
Renaud Mével Cancer Research UK | Manchester Institute
E-mail : renaud.mevel@postgrad.manchester.ac.ukmailto:renaud.mevel@gmail.com
E-mail : renaud.mevel@cruk.manchester.ac.ukmailto:renaud.mevel@gmail.com
Phone : +44 (0)7842 701 729<tel:%2B33%20%280%296%2038%2093%2016%2030> (UK)
De : Patrick Roelli notifications@github.com Envoyé : mercredi 20 février 2019 13:12 À : Hoohm/CITE-seq-Count Cc: Renaud Mvl; Mention Objet : Re: [Hoohm/CITE-seq-Count] Job never finishes (#39)
Just to be sure because the left and right have the same title. On the left is before the latest patches and the right is the latest version?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Hoohm/CITE-seq-Count/issues/39#issuecomment-465567823, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATdQtWRb0JXqIvEPo3K4YHlmsAHqjUrJks5vPUmegaJpZM4a-89A.
I'm a bit worried about the increase of doublets on channel 1 at 0.99. I've added a sanity check for a given whitelist which tests the hamming distance of cell barcodes before running.
Have you updated the develop branch and want me to re run it? will re run it
I know I had some aggregates in this pool (channel 1). Probably around 1/3 of "cells" were not single cells. Not much we can do about this, we're working with "difficult" epithelial tissue... So I'm not "surprised" that a lot of them are "doublets". What surprises me is that a lot of cells have such low HTO read count and can't be assigned to a population. So I thought this is related to the low sequencing depth, due to the issue flagged in cellranger and that you picked up too (high proportion of reads coming from a small number of cells).
Just done it on channel 1, it gives the same results.
@r-mvl Thanks a lot for you dataset, it was very helpful. A small number of reads yet still having this issue. I have some news. I'm using umi_tools for umi correction and it seems that having a really high number of umis for a TAG is the issue. As an illustration, here is the sums of umi for each cells in your dataset (without correcting those with more than 20'000) 32 cells have more than 20'000 umis (~1%). Any idea why that would be?
The quick and easy "fix" would be to just flag the aberrant values and not correct them. Maybe I'll delete them from the normal output and create a separate output for those if requested.
Hi @Hoohm,
Wondering if its possible to implement this option into CITE-seq-Count. Ie - to create a separate output for cell barcode with very high number of umis.
Best Zaki
Hello @zakiF I'm working on this for 1.4.2. For now, I'm looking into simply filtering them. Although I'd rather flag them as "not corrected" but still keep the non corrected counts.
Thanks for the update. Just to check, with the current version I have installed (v1.4.2), I am assuming these cell barcode (with very high number of UMIs) will not be present in any of the output files?
Cheers Zaki
Exact. There is a line in the report: "Bad cells" which will report how many have been deleted
The line is called "uncorrected cells" now. Fixed in 1.4.2. Closing this
Thanks for this. I managed to run it and it works. After HTOdemux get a similar (but slightly better) number of cells assigned than using the cellranger3 output. I do recover a few more (~3900 using cite-seq-count instead of ~3500). But there is still >4000 cells that get classfied as negative, not so surprising since the counts are so low. At this point we have no chance of re-running the experiment. 3500 is a decent cell number to work with for our purpose, but obviously, it's frustrating to discard half of the data. If currently 70% of the reads are "discarded", it's like if we had spiked the cDNA library using ~1% of HTO library... instead of the 5% originally planned. This is why I thought the only action we could take with these sample is possibly to sequence deeper the HTO lib. Although there is no guarantee that it will improve the data... but is there another option? or you feel it is absolutely pointless?
@Mevelo Hi, I was wondering if you figured out whether increasing sequencing depth helped to gain higher HTO read count. I am having a similar issue with lots of "negative". Almost 50%.
Thanks for this. I managed to run it and it works. After HTOdemux get a similar (but slightly better) number of cells assigned than using the cellranger3 output. I do recover a few more (~3900 using cite-seq-count instead of ~3500). But there is still >4000 cells that get classfied as negative, not so surprising since the counts are so low. At this point we have no chance of re-running the experiment. 3500 is a decent cell number to work with for our purpose, but obviously, it's frustrating to discard half of the data. If currently 70% of the reads are "discarded", it's like if we had spiked the cDNA library using ~1% of HTO library... instead of the 5% originally planned. This is why I thought the only action we could take with these sample is possibly to sequence deeper the HTO lib. Although there is no guarantee that it will improve the data... but is there another option? or you feel it is absolutely pointless?
@Mevelo Hi, I was wondering if you figured out whether increasing sequencing depth helped to gain higher HTO read count. I am having a similar issue with lots of "negative". Almost 50%.
Hi @Jimmyyun, we have not observed better performance by increasing sequencing depth. May I ask what cell type you are working with? We found that totalseq antibody labelling (tried many epitopes) massively underperforms using primary tissues, especially when moving away from PBMCs (such as epithelium...). We have moved to a different solution (MULTI-seq) which gives more consistency...
Hi @Mevelo Thanks for letting me know. I also hope we might see better performance by increasing sequencing depth. I am working on human lymphocyte cell lines, but even these cell lines gave me many single cells with low HTO UMI counts that cannot be separated rather considered as "negative".
Hello,
I have been trying to run cite-seq-count on my data but haven't been successful so far. I tried initially on the cluster but then gave up and tried locally. On the cluster, the job was never finishing so had to abort. In the error file there was this message:
And here is what happens locally on my laptop:
I installed cite-seq-count version 1.4.1, with python 3.7.0. Any help would be much appreciated.
Thank you