alexdobin / STAR

RNA-seq aligner
MIT License
1.86k stars 506 forks source link

Combining outs of same libraries resequenced #781

Open achamess opened 4 years ago

achamess commented 4 years ago

I sequenced 3 10x libraries using shallow sequencing on the Nextseq. I processed them using Solo. In the meantime, I also re-sequenced those libraries using Novaseq. I processed these separately using Solo. So now I have Solo outputs from two separate seq runs of the same libraries.

What is the best way to combine them before taking them into Seurat? Do I have to rerun Solo all over again, giving multiple Fastq arguments? Or is there a way to combine the finished outputs?

alexdobin commented 4 years ago

Hi @achamess

I think Seurat has an option for merging datasets... that's probably the easiest path.

Otherwise, you would need to combine the "barcodes.tsv" and "matrix.mtx" files. The barcodes have to be labeled with extra characters (e.g. -1, -2 ...) since there may be the same cell barcodes in different runs, but they represent different cells. After this, you can simply concatenate barcodes from run1 and run2, and then add the number of barcodes in run1 to the 2nd column of the matrix.mtx of run2, and concatenate the matrix.mtx run1 run2.

I should probably write an awk script fo this. :)

Cheers Alex