gdewael / cpg-transformer

CpG Transformer for imputation of single-cell methylomes
MIT License
37 stars 10 forks source link

Use of Bismark output for imputation #5

Closed AST87 closed 2 years ago

AST87 commented 2 years ago

Hi Gaetan, CpG-transformer is a great tool. I have been analyzing single-cell BSC data (from NMT-seq) and would like to impute some of the missing CpGs. However, I am unable to figure out a way to use cpg-transformer on bismark output file. I tried FastatoTable function on raw data to prepare input files but it gets complicated for paired-end sequencing.

Could you please suggest a way by which I could use the output from bismark as input to cpg-transformer? If not, could you please suggest how to use the imputed output from raw data in Bismark?

gdewael commented 2 years ago

Hi, thanks for your interest in using CpG Transformer!

I am not a specialist in using Bismark, but I think you should be able to get a single CpG methylation file for every cell. (Look for the "Bismark methylation extractor" steps here).

Then, from these Bismark output files, you should try to construct a tab-separated file that looks like this. Where every row indicates a CpG Site, and every column with -1s, 0s, and 1s indicates a cell. This will require some manual programming, as I am not sure an automated script (from bismark to this tsv format) is a good choice as users may have wildly different experimental settings and needs.

Once you have this script, you can use our provided functions EncodeFromTsv.py to encode the methylation calls and EncodeGenome.py to encode the genome to NumPy formats for input to the model.

Let me know if this answers your questions!

AST87 commented 2 years ago

Thank you Gaetan for the solution.