Closed pinin4fjords closed 5 years ago
Hi @pinin4fjords ,
Thanks for raising an important question and running alevin for the training.
I think there is a confusion regarding the quantmerge
command. That command works only with bulk RNA-seq quants not with alevin output. To answer your question of running multiple alevin instance for multiple file pair, might depend on what are the separate files from, are they from separate lanes or are they separated based on cellular barcode ? The basic intuition is after initial barcode assignment, alevin works on each cell disjointly meaning as long as you are confident that each file pair is cell disjoint then at the end you can just cat the output of the alevin quants. Also, depending on what's the training about you can think of multiple workarounds like you can use very small 100 cell (7 million reads) datasets from 10x and combine it all together in one file if size and multiple files is a problem.
Thanks @k3yavi for the clarification. In my example case the files are not cell disjoint, being multiple lanes run from the same library. Obviously I can use just one lane for the training, but to be clear: in the real world in this situation all files for a library need to be run together, right?
Yes that's correct, all Lanes of the library should be run together with Alevin.
I think the issue is answered here, feel free to reopen.
Hi,
Quick question- if I have the reads for a library spread across multiple files, is it appropriate to run Alevin separately on each file pair and combine with quantmerge, rather than processing together? I'm looking to run Alevin via Galaxy for some training, and the available wrapper doesn't currently allow supplying multiple inputs. My feeling is that all files should be processed together for robust thresholding etc, but I may be worrying about nothing.