WGLab / LinkedSV

MIT License
21 stars 8 forks source link

gap regions and blacklist regions #11

Open dewshr opened 4 years ago

dewshr commented 4 years ago

Can I call the SV without providing gap regions and blacklist regions? I am using mm10 reference genome, do you know from where I can download the SV blacklist region? Thank you.

fangli80 commented 4 years ago

You can generate an empty gap region file and blacklist file and the program should be able to run. However, you may get many false-positive calls in blacklist regions, because there are noisy barcode signals in some complex regions. It's hard to tell if the signals are real or not without a control data set. I don't have a mouse WGS data set so I cannot provide the blacklist files. In this case, it would be good to have a "normal control" sample and a case sample. And you detect SVs in both samples separately and remove the SVs in the normal samples.

By the way, how many samples do you have, and do you want to call germline SVs or somatic SVs?

dewshr commented 4 years ago

for now I am only trying with one sample, and I am calling germline SV. And is it possible to call multi-sample SV calling?

fangli80 commented 4 years ago

OK. please run in germline mode. The gap region for GRCm38 is here: GRCm38.p5.genome.gap.zip

For now, it cannot call multiple samples simultaneously. Do you want to find a disease SV or do you simply want to get all germline SV calls of the sample?

If you want to find a disease SV, LinkedSV will output some plots and you can manually check the candidate SVs to see if they are real. If you want to generate a high confident germline SV call set, you may need to remove SV calls that overlap with MHC regions, telomeres, centromeres.

Best, Li

dewshr commented 4 years ago

I want to call all the SV. Thank you for your reply

dewshr commented 4 years ago

do I need to provide 2D_blacklist_file? I am getting no such file or directory found error. I looked at the code in arguments.py, and that argument is passed as an empty string

fangli80 commented 4 years ago

Sorry for the late reply. The 2D_blacklist_file is also needed if you are not using human reference genomes hg19, hg38 or b37. You can provide an empty file if you are working with mouse genomes and don't have the blacklist file.

Best, Li