Closed husain223 closed 1 year ago
So you would like to create a file with one line per read, where the methylation is shown as 0 or 1 (in a comma separated form in Col 4, with the relative position within the read in Col 5?
I am afraid you would have to write a script to this. You might be able to use a trick that it already part of the methylation extraction process, followed by something similar to the NOMe-filtering step (https://github.com/FelixKrueger/Bismark/blob/7288cb6e99db92ceb82d2143bce588affe7335ff/NOMe_filtering).
In short, you would have to re-run the methylation extractor using the option --yacht
option, which will add a few additional columns to the CpG etc. output files, in this format:
<seq-ID> <methylation state*> <chromosome> <start position (= end position)> <methylation call> <read start> <read end> <read orientation>
Importantly, in this way each methylation call in the CpG file retains the read information of the entire read (start, end, orientation), which will then allow you so derive the relative position of the C for the methylation call (column 5 in your example).
You will need to modify the NOMe-filtering script to not filter the NOMe-seq context, but rather just right out the positions into a comma-separated variable. All reads from a given read ID are already read in, it should really be fairly easy to accomplish what you need.
All the best! Felix
Thanks i tried to give --yacht argument but it was complaining that the data should be in single end not in paired end
Yes, I am afraid this was only ever implemented in single-end mode. I suppose you could use the option -s
to force single-end mode to see if that works in the first instance?
Yep, it works, but using the "-s" option for paired-end data might loss some methylation details or end up with multiple methylation signals for regions that overlap.
Yep, it works, but using the "-s" option for paired-end data might loss some methylation details or end up with multiple methylation signals for regions that overlap.
True that, I found this in the help-text:
...Its intended use is single-cell NOMe-Seq data, and thus this option works only in single-end mode (paired-end reads often suffer rom chimaera problems...)
If this is essential for your project I am afraid you would need to find a way to extend this to paired-end coordinates somewhow.
Hello, I have a question about generating a methylation file in a specific format. I've completed all the necessary steps, starting from genome preparation, methylation extraction, and using bismarkTobedgraph. Now, I'm looking to transform the methylation calls into the desired file format:
Column 1 to Column 3: Genomic coordinates. Column 4: Indicating the methylation. Column 5: The corresponding methylated positions on the read. I am seeking guidance on which specific arguments I should provide at a certain step in order to obtain the desired output file. If necessary, I am open to writing a script to generate the file, and in this case, I would appreciate any suggestions on which file would be most suitable for this purpose.