merge_cnvnator_results.py not including appropriate chr names with hg38

bjtrost / TCAG-WGS-CNV-workflow

Scripts involved in our workflow for detecting CNVs from WGS data using read depth-based methods

MIT License

45 stars 17 forks source link

merge_cnvnator_results.py not including appropriate chr names with hg38 #3

Closed MaestSi closed 6 years ago

MaestSi commented 6 years ago

Dear bjtrost, after changing the line: gap_file="$root"/hg19_gap.bed to gap_file="$root"/hg38_gap.bed in process_cnvs.erds+.sh (and saving the appropriate gap table), I noticed that the file cnvn/merged/NA12878.calls.txt.cluster.txt also contains some lines where the second column (the chromosome ID) is missing 'chr'. I think this may be due to some commands in merge_cnvnator_results.py script (probably something like the following): chrm = words[1].replace("chr","") How do you think this could be solved? Thank you

bjtrost commented 6 years ago

Hi,

This can be solved by removing the .replace("chr","") part. I updated the scripts on GitHub.

Cheers,

Brett

MaestSi commented 6 years ago

Ok, thank you!

MaestSi commented 6 years ago

Dear Brett, I re-git cloned the repository, and noticed you restored the <.replace("chr","")> part in the files, as they were before your modification (the update you are writing about in your previous post in this thread). I guess they caused some problems with hg19 reference. So, I am going to remove the <.replace("chr","")> part again in downloaded files. However, I noticed you did another few modifications in add_features.py file. Can I let them as they are in the current version, or do I have to restore also those parts as they were prior to you modification? Thanks.

bjtrost commented 6 years ago

Hi,

I am sorry for the inconvenience. I pushed a new "chr-fix" branch to the repository. Can you clone this one and see if it works for you? hg38_gap.bed should NOT have the chr prefix. The main output file will lack the chr prefix as well. I just tested it using ERDS/CNVnator files containing "chr" and with files not containing "chr" and got the exact same output, so I believe this works, but please try yourself and let me know if you have any problems. (If you need the "chr" prefix for downstream analysis, you can always add it back in.)

Thanks,

Brett

MaestSi commented 6 years ago

Dear Brett, I git cloned the "chr-fix" branch and removed 'chr' prefix from the hg38_gap.bed. Everything seems to work, I added the 'chr' prefix in the final bed file at the end of the analysis, so thank you! Simone