Closed PeteHaitch closed 6 years ago
Ah, I just saw https://github.com/COMBINE-lab/salmon/issues/247. I'll experiment with this first.
Hi @PeteHaitch , Thanks for your interest in Alevin. Although in current Alevin we have concentrated mainly on learning more about Droplet based 3'-tagged single cell protocols, especially 10x; we are very much interested in extending it towards other protocols like CEL-seq. However, there are couple of challenges/difference which should be considered before incorporating it into the Alevin pipeline. Currently Alevin relies on the fact that the droplet based protocols use PCR amplification of the library and the UMI deduplication phase of Alevin assumes an exponential model, I am not sure how true is this with CEL-seq? Another issue is that CEL-seq is a Fluidigm based system while the current application for Alevin is for microfluidics based. In general we have observed that the 10x cell isolation step is pretty robust in reporting the Cellular Barcodes(CB) and although we have a probabilisitic model to handle the CB based uncertainty but the ambiguous case like that are very less frequent, (although not true for Drop-Seq). Having said that, we might have to do some analysis to actually figure out the right model for Barcode correction in Fluidigm based system.
Also, please do let us know of your experience in using the solution proposed in #247 . Looking forward to hearing back from you.
Toying around with the solution in #247, I think I've found why it's not yet working for me. The CEL-Seq2 read1 is UMI + CB whereas 10X is CB + UMI (to my understanding). Is there an easy way to tell alevin the order is reversed or will I need hack around some?
Hey @PeteHaitch ,
I think currently there is no direct way to tell Alevin to use CB and UMI in reverse order and you might have to hack a bit for that. Although it should not be too hard to do that. Specifically, the extractBarcodes
and extractUMI
function here has to be updated with a new generic type (celseq
may be). Let us know if it works out for you otherwise I can take a look into this sometime next week.
Hi @PeteHaitch ,
I have just pushed a potentially testable version in Alevin for cel-seq2 ( activated by --celseq
command line flag ), although to make it work the develop branch has to compiled from source.
A couple of points to note:
Please let us know how it works out for you and if at all it's useful / comparable to the output generated by the traditional cel-seq2 pipeline.
Oh, I should've pushed my PR sooner! Thanks! I'll take a look how it compares to what I did. One thing to note is that it'd be useful to be able to specify the length of the CB - we use 8 bp in our slightly-adapted CEL-Seq2 protocol.
Hi @PeteHaitch! I agree with @PeteHaitch here --- I think we should provide an easy way to specify custom cb & umi parameters paired with a particular protocol. For 10x v2, since it's a very standard commercial protocol, I think simply having a --chromium
flag is probably OK. But we should make it easy for ppl to tweak their CB & UMI lengths.
@PeteHaitch Thanks for making the pull request and correcting the barcode length for the celseq2 protocol. We'll review it soon and merge it to the develop (which will be merged to master in next release).
@rob-p I think we already have that capability of specifying the CB and UMI length, it's just CelSeq2 was little difference in the order of them. Basically the flags like --chromium
or any other protocols are wrapper around using the standard CB and UMI lengths. If one wants a customization we can always use --umiLength
and --barcodeLength
. I am thinking of tweaking the --end
part of the struct
to select the order of the CB and UMI which incase of CelSeq2 is reverse.
latest commit https://github.com/COMBINE-lab/salmon/commit/093b5a98e16cab7c3934c0a7c222549644c39728 will generalize the write_fastq
for all the protocols. @PeteHaitch Thanks again for making the pull request, do let us know how does the quantified matrix looks at the end for the Cel-Seq2 protocol or what more we can do in Alevin to help improve the results.
Closing this issue for now but feel free to open it again if have any other problem.
Thanks, @k3yavi! I'll be sure to share my experience and any comparisons I perform.
I've just started working in a single-cell genomics core facility, alevin looks really useful for our 10X runs! But I also have a substantial number of samples processed with the CEL-Seq and CEL-Seq2 protocols (also 3'-tag protocols). I'm interested in adding support for these protocols to Alevin.
Cheers, Pete