STOmics / SAW

GNU General Public License v3.0
145 stars 34 forks source link

CID whitelist #136

Open tenlives opened 3 months ago

tenlives commented 3 months ago

Hi, could you tell me that which file in Stereo-seq data is or contains CID whitelist? And when I get the CID barcode, how could I know the position in the chip? Thank you !

Clouate commented 3 months ago

Hi, did you need the fastq that could be mapped to the Stereo-seq chip? If so, you need to add validCidFq to the last line of the *.bcPara file in 00.mapping, and then rerun the mapping step to obtain the fastq file containing valid CID reads, CID sequence and their positions in the chip as shown below(from Documents/UserManual/Stereo-seq_Analysis_Workflow_User_Manual_A9.pdf Page 23). image

In addition, if you want to obtain all CID sequences and their positions on the chip, it is recommended to use ST_BarcodeMap. The conversion method is 'mask format change' at the bottom of the README. https://github.com/STOmics/ST_BarcodeMap

tenlives commented 3 months ago

Thanks for your reply! I have the fastq files with CID sequences, which need to be sequencing error corrected to whitelist CID. I already have the procedure of the error correction. Do you mean that the mask.h5 file contains the CID barcode whitelist and their positions on the chip?

Clouate commented 3 months ago

Thanks for your reply! I have the fastq files with CID sequences, which need to be sequencing error corrected to whitelist CID. I already have the procedure of the error correction. Do you mean that the mask.h5 file contains the CID barcode whitelist and their positions on the chip?

Yes, SN.barcodeToPos.h5 contains the CID and their positions. Inside is a matrix, and the value of the matrix[y, x, 0] is the transcoded CID sequence. ST_barcodemap could effectively help convert it to a txt file whose format is sequence x y