STOmics / SAW

GNU General Public License v3.0
119 stars 32 forks source link

Adding bin id tags to reads in bam file #105

Closed Ed2uiz closed 1 month ago

Ed2uiz commented 1 month ago

Hello,

Is there another tool/method, or perhaps an option within SAW to add a metadata tag that maps a bin ID to each read in the bam file? I'm thinking of something similar to CB:Z: found in for e.g. 10X bam files. My understanding is that the bam files produced from SAW contain Cx:i:, Cy:i:, UR:Z:, XF:Z, GE:Z, GS:Z and UB:Z metadata tags which cannot be mapped back to the bins after running the SAW pipeline.

Thanks, Ed

TheSallyGardens commented 1 month ago

@Ed2uiz Hi! Include coordinate information in the gem file, which can correspond to the Cx: i:, Cy: i coordinate information in the bam.

XiaolongYang-HZAU commented 1 month ago

I have the same issue. I tried running soupcell on the generated bam file to remove contaminated cells, but it requires that the bam file contains barcode information. I couldn't find any barcode information like 'CB:Z' in the bam file, nor do I know how to match the barcodes to the bam file. I was unable to successfully run soupcell. Did you manage to solve the problem?

XiaolongYang-HZAU commented 1 month ago

I have read the recommended documents in https://github.com/STOmics/SAW/issues/9, but simply combining Cx and Cy together does not correspond to the barcode used in the analysis. How can I add barcode information to the bam file?

Ed2uiz commented 1 month ago

@Ed2uiz Hi! Include coordinate information in the gem file, which can correspond to the Cx: i:, Cy: i coordinate information in the bam.

@TheSallyGardens Like @XiaolongYang-HZAU just mentioned, I do not see how the Cx:i and Cy:i can correspond to the barcode in this case. Can you please elaborate on how this could be done? What if we want to assign different ids to different bin sizes? For example, bin500 and bin 50 in the GEM/GEF should ideally be assigned in the bam files as unique metadata tags corresponding to for eg CB500:Z: or CB50:Z: respectively.

TheSallyGardens commented 1 month ago

The SN.barcodeToPos.h5 file contains the coordinate information of each sopt point and its corresponding CID sequence (referred to as the barcode sequence),https://github.com/STOmics/ST_BarcodeMap can be used to decode CID information. Hope it can be helpful!