BSSeeker / BSseeker2

A versatile aligning pipeline for bisulfite sequencing data
http://pellegrini.mcdb.ucla.edu/BS_Seeker2/
MIT License
60 stars 25 forks source link

Renaming Chromosomes in the wig file #18

Open markmcdowall opened 5 years ago

markmcdowall commented 5 years ago

Hi,

When running the indexing steps over genomes where the chromosome name includes a ., these get converted to an underscore. I have wrapped the steps as part of a pipeline and wish to convert the output to a bigwig file, but this fails due to the name changes.

Is it possible to prevent the renaming of the chromosomes, or is there a specific reason why there is renaming?

Cheers,

Mark

guoweilong commented 5 years ago

Thanks for your reporting.

I re-checked the codes, and didn't find any code convert "." to "_" for chromosome names. As BS-Seeker2 is wrapping above bowtie/bowtie2, it might be the bowte/bowtie2 build-in function.

Also you're welcomed to give more details on which step and examples of chr names you use.

Best, Weilong

markmcdowall commented 5 years ago

Hi Weilong,

Thank you for getting back to me.

I started digging through the code and it looks like the change happens as part of the read_fasta() function in the bam_utils (https://github.com/BSSeeker/BSseeker2/blob/master/bs_utils/utils.py). There is a regex (sanitize_seq_id) to convert all values that are not A-Z, a-z or 0-9 to a _ (lines 275 and 286).

For example the chromosome name CM001012.2 (mouse chromosome 19 as supplied by ENA - https://www.ebi.ac.uk/ena/data/view/CM001012). This gets converted from CM001012.2 to CM001012_2, which makes it impossible to view the bam file on something like JBrowse if the name is loaded as CM001012.2.

Adding . to the list of non-removed characters might be enough to prevent this and not affect names of files in a system.

Cheers,

Mark

guoweilong commented 5 years ago

@markmcdowall Great thanks for figuring out this bug. And the code is fixed in the new release (v2.1.7).

Best, Weilong

markmcdowall commented 5 years ago

Thanks Weilong,

I'll update my installation.

Cheers,

Mark