bioinfo-biols / SEVtras

sEV-containing droplet identification in scRNA-seq data (SEVtras)
GNU Affero General Public License v3.0
20 stars 5 forks source link

The number of droplets in sEV_recognizer output #25

Open HZT55 opened 2 months ago

HZT55 commented 2 months ago

Hi, Thank you for developing a useful tool. I am confused about the output of sEV_recognizer function. The n_obs of raw_SEVtras.h5ad file has 1129,431 lines, nearly 10 times more cells than my original cell count. The code as follows: import SEVtras SEVtras.sEV_recognizer(sample_file='sample_file', out_path='./', species='Homo', predefine_threads=10, dir_origin=True)

There are 9 samples in this sample_file, and I compared the raw data of one sample to the output of sEV. I'm sure the sample is running successfully.

image

SEVtras output adata_ev = sc.read_h5ad('./01.sEV_recognizing/tmp_out/cellranger_SRR17008554/cellranger_SRR17008554.h5ad')

image

cellRanger output adata_o = sc.read_10x_mtx('/home/cellranger/SRR17008554/outs/filtered_feature_bc_matrix')

image

The number of barcodes in these two files is quite different, and I don't know why.

image

More surprising,when I compared the original barcode (cellRanger output) with the barcode in the sEV_SEVtras.h5ad file of the sample, there was no duplication between the two.

image

I don't know if this result is normal, could you please explain it?

RuiqiaoHe commented 2 months ago

The raw gene-barcode matrix includes all valid barcodes from GEMs (Gel Bead-In EMulsions) captured in the data. However, since most GEMs do not actually contain cells, it follows that most barcodes in the data do not correspond to cells, which has the potential to identify sEVs. The filtered gene-barcode matrix will only include barcodes where GEMs are likely to contain cells. There would be no overlapping barcodes between potential cells and potential sEVs.