bioinfo-biols / SEVtras

sEV-containing droplet identification in scRNA-seq data (SEVtras)
GNU Affero General Public License v3.0
17 stars 5 forks source link

Issues of batch in sEV_recognizer and sample in ESAI_calculator #23

Open Jinqingchang opened 2 months ago

Jinqingchang commented 2 months ago

Dear Developer,

Thank you for creating such a useful tool.

I encountered some issues while using the ESAI_calculator function. Initially, I performed sEV_recognizer with multiple samples using the following code:

SEVtras.sEV_recognizer(
    input_path='../input_sevtras',
    sample_file='../input_sevtras/sample_file',
    out_path='../sev_recognizing/27samples/500UMI',
    species='Mus',
    predefine_threads=-2,  
    get_only=False,  
    score_t=None,  
    search_UMI=500,  
    alpha=0.15,  
    dir_origin=True  
)

The resulting sEV files contain batch information. 图片

I then used the ESAI_calculator function, and the output in SEVtras_sEVs.h5ad under obsm contains source information with both type and sample fields.

import SEVtras
SEVtras.ESAI_calculator(
    adata_ev_path='../sev_evaluating/seurat_to_h5/sev_evaluate/adata_ev.h5ad',
    adata_cell_path='../sev_evaluating/seurat_to_h5/sev_evaluate/cell_ready.h5ad',
    out_path='../sev_evaluating/seurat_to_h5/sev_evaluate_output',
    species='Mus',
    OBSsample='batch',
    OBScelltype='celltype',
    OBSev='sEV',
    OBSMpca='X_pca',
    cellN=10,
    Xraw=True,
    normalW=True,
    plot_cmp='SEV_builtin',
    save_plot_prefix='',
    OBSMumap='X_umap',
    size=10
)

However, the sample information does not correspond to the output from sEV_recognizer. 图片

I am unclear about the reason for this discrepancy. Could you please clarify which result is correct?

RuiqiaoHe commented 2 months ago

Thank you for your feedback. The sample information in the output of SEVtras.ESAI_calculator has been sorted by sample name. You can find the same sEVs based on batch information in adata.obs['batch']. In addition, the SEVtras score has been filtered by SEVtras.sEV_recognizer in sEVs_SEVtras.h5ad by default. If you want to change the threshold, I recommend to use the parameter score_t, e.g. score_t='15'.

Jinqingchang commented 2 months ago

Thank you for your feedback. The sample information in the output of SEVtras.ESAI_calculator has been sorted by sample name. You can find the same sEVs based on batch information in adata.obs['batch'].感谢您的反馈意见。 SEVtras.ESAI_calculator 输出中的样本信息已按样本名称排序。您可以根据 adata.obs['batch'] 中的批次信息找到相同的 sEV。 In addition, the SEVtras score has been filtered by SEVtras.sEV_recognizer in sEVs_SEVtras.h5ad by default. If you want to change the threshold, I recommend to use the parameter score_t, e.g. score_t='15'. 另外,SEVtras分数默认已经被sEVs_SEVtras.h5ad中的SEVtras.sEV_recognizer过滤。如果您想更改阈值,我建议使用参数 score_t ,例如 score_t='15'

Thank you for your quick reply, but I still have some concerns. The batch information in obs and the sample information in obsm of the SEVtras_sEVs.h5ad file itself do not correspond, even though both represent the source sample of the barcode.This does not seem to be related to sorting. For example, the barcode AAACCCAGTGGATACG-1 has inconsistent batch information in obs and sample information in obsm. 图片

RuiqiaoHe commented 2 months ago

Could your please try to find AAACCCAGTGGATACG-1-1 AAACCCAGTGGATACG-1-2 AAACCCAGTGGATACG-1-3 .... in output adata? In my tests, the information in adata.obs['batch'] and adata.obsm['source']['sample'] would be the same.

Jinqingchang commented 2 months ago

This will cause an error because the index of the output file sev_sEVs.h5ad from the ESAI_calculator function is not named as you mentioned. In fact, the barcode I showed in the first comment is the result of my own adjustment, aiming to match the barcodes in the SEVtras_sEVs.h5ad file with those in the SEVtras_combined.h5ad file. Otherwise, the barcodes of these two files would not correspond. This is not the output result of the ESAI_calculator function. 图片 图片 When I do not make any modifications to the output file and directly check the file output by the ESAI_calculator function, the obs of the two files are as shown in the two images above. The obsm of the SEVtras_sEVs.h5ad file is as shown in the image below. 图片 I tried the last barcode TTTGGTTCACTACCGG-1-1; it is inconsistent between obs and obsm. 图片

RuiqiaoHe commented 2 months ago

You don't need to match the cell barcode names in the two files. Here's why: First, the cell barcode in SEVtras_combined.h5 is suffixed by '1-1', '1-2', etc. because the same barcode is in different single-cell samples. I suggest that you can add some different character to distinguish them. Second, SEVtras obtains sample or batch information only in the first step (SEVtras.sEV_recognizer). However, the inputted adata_cell file in the second step still contains the same barcodes with the adata_ev. This causes the barcodes in the adata_combined to have an extra '-1' (from -1 to -1-1, from -1-2 to -1-1-2). Simply change the cell barcode in the adata_ev with a certain character, such as an 'E' (from AAACCCAGTGGATACG-1 to AAACCCAGTGGATACG-1E), and re-run SEVtras.ESAI_calculator. By the way, I suspected that the cell barcode in your samples may have been replaced by certain software, which is not the real one.

Jinqingchang commented 2 months ago

Thank you for your kind reminder. I will review my data processing steps carefully. If there are no errors in the execution, I will get back to you.