Hoohm / CITE-seq-Count

A tool that allows to get UMI counts from a single cell protein assay
https://hoohm.github.io/CITE-seq-Count/
MIT License
79 stars 44 forks source link

Job never finishes #39

Closed RM-SCB closed 5 years ago

RM-SCB commented 5 years ago

Hello,

I have been trying to run cite-seq-count on my data but haven't been successful so far. I tried initially on the cluster but then gave up and tried locally. On the cluster, the job was never finishing so had to abort. In the error file there was this message:

No local minima was accepted. Recommend checking the plot output and counts per local minima (requires `--plot-prefix` option) and then re-running with manually selected threshold (`--set-cell-number` option)

And here is what happens locally on my laptop:

Unable to revert mtime: /Library/Fonts
Unable to revert mtime: /Library/Fonts/Microsoft
Counting number of reads
Started mapping
CITE-seq-Count is running with 4 cores.
Processed 1,000,000 reads in 23.67 seconds. Total reads: 1,000,000 in child 29798
Processed 1,000,000 reads in 39.92 seconds. Total reads: 1,000,000 in child 29799
Mapping done for process 29798. Processed 1,969,487 reads
Processed 1,000,000 reads in 1.0 minute, 2.679 seconds. Total reads: 1,000,000 in child 29801
Mapping done for process 29799. Processed 1,969,487 reads
Mapping done for process 29801. Processed 1,969,487 reads
Processed 1,000,000 reads in 1.0 minute, 30.52 seconds. Total reads: 1,000,000 in child 29800
Mapping done for process 29800. Processed 1,969,487 reads
Mapping done
Merging results
Correcting cell barcodes
ERROR:root:No local minima was accepted. Recommend checking the plot output and counts per local minima (requires `--plot-prefix` option) and then re-running with manually selected threshold (`--set-cell-number` option)
Could not find a good local minima for correction.
No cell barcode correction was done.
Correcting umis

I installed cite-seq-count version 1.4.1, with python 3.7.0. Any help would be much appreciated.

Thank you

RM-SCB commented 5 years ago

If that helps, here is a link to R1 and R2 files of the HTO library https://www.dropbox.com/s/bw2dwuoajba6yzn/R1_R2.zip?dl=0 and a link to the tags and barcode whitelist https://www.dropbox.com/s/8ff392heznmna58/tags%20and%20barcodes.zip?dl=0

Hoohm commented 5 years ago

Hello @r-mvl I'm having the same issue as yours. At first, I thought it was because of too many UMIs to correct but that is probably not it since this dataset is pretty small.

I have to run more tests to find out what it is. I'm still on holiday right now but I'll be back at the office on Monday, should find a fix for next week.

In the meantime, if you need it to run ASAP, you can turn off the umi correction on the develop branch.

RM-SCB commented 5 years ago

Many thanks @Hoohm . No worries, I will try using the develop branch for the weekend then!

Just out of curiosity, have you ran cellranger3 using their feature barcoding quantification and compared results with cite-seq-count? Are results similar?

Hoohm commented 5 years ago

I get the same issue with the develop branch, sadly, it didn't fix it :(

I haven't tried it because I got this issue with the 10x data I'm testing right now. I will of course compare cellranger results as soon as this is fixed.

Hoohm commented 5 years ago

@r-mvl Thanks a lot for you dataset, it was very helpful. A small number of reads yet still having this issue. I have some news. I'm using umi_tools for umi correction and it seems that having a really high number of umis for a TAG is the issue. As an illustration, here is the sums of umi for each cells in your dataset (without correcting those with more than 20'000) image 32 cells have more than 20'000 umis (~1%). Any idea why that would be?

The quick and easy "fix" would be to just flag the aberrant values and not correct them. Maybe I'll delete them from the normal output and create a separate output for those if requested.

RM-SCB commented 5 years ago

Thank you for your reply. Interestingly, I ran cellranger 3 using feature barcoding quantification and it flags these as well. See the example of the output

screen shot 2019-02-18 at 12 04 15

We don't know why this is. I also ran cellranger3 on a previous cell hashing experiments which has returned a similar problem. 10x gives some more info here: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/algorithms/antibody So apparently this is something that might happen sometimes (frequently...?)

Hoohm commented 5 years ago

You can test out the latest develop branch. It will discard offending cells and not try to correct them. Let me know if you get a better results for HTOdemux

Hoohm commented 5 years ago

Also, I would not advise sequencing deeper, you might have the same issues down the line. I would focus on getting rid of those "aggregates" of cells.

RM-SCB commented 5 years ago

Thanks for this. I managed to run it and it works. After HTOdemux get a similar (but slightly better) number of cells assigned than using the cellranger3 output. I do recover a few more (~3900 using cite-seq-count instead of ~3500). But there is still >4000 cells that get classfied as negative, not so surprising since the counts are so low. At this point we have no chance of re-running the experiment. 3500 is a decent cell number to work with for our purpose, but obviously, it's frustrating to discard half of the data. If currently 70% of the reads are "discarded", it's like if we had spiked the cDNA library using ~1% of HTO library... instead of the 5% originally planned. This is why I thought the only action we could take with these sample is possibly to sequence deeper the HTO lib. Although there is no guarantee that it will improve the data... but is there another option? or you feel it is absolutely pointless?

Hoohm commented 5 years ago

Happy to hear that you get a bit more cells tagged properly.

I'm confused as to how many cells you expect from CITE-seq-Count since you provide a whitelist of ~4000 cells. 3900 cells seem reasonable. Which cells are the 4000 missing?

RM-SCB commented 5 years ago

Sorry for the confusion. I actually have 2 times 4000 cells (two 10x channels) in the sequencing run. When I said 3900, that's after pooling the 2 libraries. I recover <2000 in each...


Renaud Mével Cancer Research UK | Manchester Institute

E-mail : renaud.mevel@postgrad.manchester.ac.ukmailto:renaud.mevel@gmail.com

E-mail : renaud.mevel@cruk.manchester.ac.ukmailto:renaud.mevel@gmail.com

Phone : +44 (0)7842 701 729<tel:%2B33%20%280%296%2038%2093%2016%2030> (UK)


De : Patrick Roelli notifications@github.com Envoyé : mardi 19 février 2019 08:34 À : Hoohm/CITE-seq-Count Cc: Renaud Mvl; Mention Objet : Re: [Hoohm/CITE-seq-Count] Job never finishes (#39)

Happy to hear that you get a bit more cells tagged properly.

I'm confused as to why how many cells you expect from CITE-seq-Count since you provide a whitelist of ~4000 cells. 3900 cells seem reasonable. Which cells are the 4000 missing?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Hoohm/CITE-seq-Count/issues/39#issuecomment-465036427, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATdQte8FfeG3xfGEY5pPEwfKiWZDFwqtks5vO7aVgaJpZM4a-89A.

Hoohm commented 5 years ago

Maybe try to run it without the whitelist and ask for ~8000 cells. Maybe you get a few more that you're missing.

Hoohm commented 5 years ago

I've pushed a new update on the develop branch. Can you pull and try again? Cell barcode correction should run now properly and not throw an error. I got a few more umis on your data. Not sure it will be enough to help you out though.

RM-SCB commented 5 years ago

Thanks. I’ve just tried these (w and w/o whitelist) using the updated develop branch. It changes a bit the figures but not by much. The good thing is that it doesn’t give an error.

Hoohm commented 5 years ago

I'm getting close to a final version for 1.4.2 Can you try it one more time? This time, with a whitelist because it takes advantage of it.

RM-SCB commented 5 years ago

I've re-run it and it seems fine, no errors.

I've attached a summary of the number of cells I get before and after the last 2 updates (when it still gave an error but ran until the end), using HTODemux with different cut-offs for the positive quantile (after filtering out low quality cells).

screen shot 2019-02-19 at 22 10 08

Hoohm commented 5 years ago

Just to be sure because the left and right have the same title. On the left is before the latest patches and the right is the latest version?

RM-SCB commented 5 years ago

Yes sorry


Renaud Mével Cancer Research UK | Manchester Institute

E-mail : renaud.mevel@postgrad.manchester.ac.ukmailto:renaud.mevel@gmail.com

E-mail : renaud.mevel@cruk.manchester.ac.ukmailto:renaud.mevel@gmail.com

Phone : +44 (0)7842 701 729<tel:%2B33%20%280%296%2038%2093%2016%2030> (UK)


De : Patrick Roelli notifications@github.com Envoyé : mercredi 20 février 2019 13:12 À : Hoohm/CITE-seq-Count Cc: Renaud Mvl; Mention Objet : Re: [Hoohm/CITE-seq-Count] Job never finishes (#39)

Just to be sure because the left and right have the same title. On the left is before the latest patches and the right is the latest version?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Hoohm/CITE-seq-Count/issues/39#issuecomment-465567823, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATdQtWRb0JXqIvEPo3K4YHlmsAHqjUrJks5vPUmegaJpZM4a-89A.

Hoohm commented 5 years ago

I'm a bit worried about the increase of doublets on channel 1 at 0.99. I've added a sanity check for a given whitelist which tests the hamming distance of cell barcodes before running.

RM-SCB commented 5 years ago

Have you updated the develop branch and want me to re run it? will re run it

I know I had some aggregates in this pool (channel 1). Probably around 1/3 of "cells" were not single cells. Not much we can do about this, we're working with "difficult" epithelial tissue... So I'm not "surprised" that a lot of them are "doublets". What surprises me is that a lot of cells have such low HTO read count and can't be assigned to a population. So I thought this is related to the low sequencing depth, due to the issue flagged in cellranger and that you picked up too (high proportion of reads coming from a small number of cells).

RM-SCB commented 5 years ago

Just done it on channel 1, it gives the same results.

zakiF commented 5 years ago

@r-mvl Thanks a lot for you dataset, it was very helpful. A small number of reads yet still having this issue. I have some news. I'm using umi_tools for umi correction and it seems that having a really high number of umis for a TAG is the issue. As an illustration, here is the sums of umi for each cells in your dataset (without correcting those with more than 20'000) image 32 cells have more than 20'000 umis (~1%). Any idea why that would be?

The quick and easy "fix" would be to just flag the aberrant values and not correct them. Maybe I'll delete them from the normal output and create a separate output for those if requested.

Hi @Hoohm,

Wondering if its possible to implement this option into CITE-seq-Count. Ie - to create a separate output for cell barcode with very high number of umis.

Best Zaki

Hoohm commented 5 years ago

Hello @zakiF I'm working on this for 1.4.2. For now, I'm looking into simply filtering them. Although I'd rather flag them as "not corrected" but still keep the non corrected counts.

zakiF commented 5 years ago

Thanks for the update. Just to check, with the current version I have installed (v1.4.2), I am assuming these cell barcode (with very high number of UMIs) will not be present in any of the output files?

Cheers Zaki

Hoohm commented 5 years ago

Exact. There is a line in the report: "Bad cells" which will report how many have been deleted

Hoohm commented 5 years ago

The line is called "uncorrected cells" now. Fixed in 1.4.2. Closing this

Jimmyyun commented 4 years ago

Thanks for this. I managed to run it and it works. After HTOdemux get a similar (but slightly better) number of cells assigned than using the cellranger3 output. I do recover a few more (~3900 using cite-seq-count instead of ~3500). But there is still >4000 cells that get classfied as negative, not so surprising since the counts are so low. At this point we have no chance of re-running the experiment. 3500 is a decent cell number to work with for our purpose, but obviously, it's frustrating to discard half of the data. If currently 70% of the reads are "discarded", it's like if we had spiked the cDNA library using ~1% of HTO library... instead of the 5% originally planned. This is why I thought the only action we could take with these sample is possibly to sequence deeper the HTO lib. Although there is no guarantee that it will improve the data... but is there another option? or you feel it is absolutely pointless?

@Mevelo Hi, I was wondering if you figured out whether increasing sequencing depth helped to gain higher HTO read count. I am having a similar issue with lots of "negative". Almost 50%.

RM-SCB commented 4 years ago

Thanks for this. I managed to run it and it works. After HTOdemux get a similar (but slightly better) number of cells assigned than using the cellranger3 output. I do recover a few more (~3900 using cite-seq-count instead of ~3500). But there is still >4000 cells that get classfied as negative, not so surprising since the counts are so low. At this point we have no chance of re-running the experiment. 3500 is a decent cell number to work with for our purpose, but obviously, it's frustrating to discard half of the data. If currently 70% of the reads are "discarded", it's like if we had spiked the cDNA library using ~1% of HTO library... instead of the 5% originally planned. This is why I thought the only action we could take with these sample is possibly to sequence deeper the HTO lib. Although there is no guarantee that it will improve the data... but is there another option? or you feel it is absolutely pointless?

@Mevelo Hi, I was wondering if you figured out whether increasing sequencing depth helped to gain higher HTO read count. I am having a similar issue with lots of "negative". Almost 50%.

Hi @Jimmyyun, we have not observed better performance by increasing sequencing depth. May I ask what cell type you are working with? We found that totalseq antibody labelling (tried many epitopes) massively underperforms using primary tissues, especially when moving away from PBMCs (such as epithelium...). We have moved to a different solution (MULTI-seq) which gives more consistency...

Jimmyyun commented 4 years ago

Hi @Mevelo Thanks for letting me know. I also hope we might see better performance by increasing sequencing depth. I am working on human lymphocyte cell lines, but even these cell lines gave me many single cells with low HTO UMI counts that cannot be separated rather considered as "negative".