davidebolo1993 / VISOR

VarIant SimulatOR for short, long and linked reads
GNU Lesser General Public License v3.0
41 stars 10 forks source link

Pacbio CCS (HiFi) data #8

Closed biozzq closed 4 years ago

biozzq commented 4 years ago

Dear @davidebolo1993

How can i produce CCS data with VISOR? Thank you.

Best, Zheng zhuqing

davidebolo1993 commented 4 years ago

Hi @biozzq,

CCS data should have very low error rates (< 1%). This can be simply implemented in VISOR LASeR by specifying a proper value for the --accuracy parameter (something like --accuracy 0.99 should be fine). If you have real CCS data handy, you can also derive a true-to-life substitution:insertion:deletion ratio (for example, by using Alfred - see https://github.com/tobiasrausch/alfred) that you can give to the --ratio parameter of the LASeR module. For what is worth, we noticed that for real, non-CSS, pacbio datasets the substitution:insertion:deletion ratio is approximately 15:50:35. Not sure if this ratio is the same for CSS data though.

Best,

Davide

biozzq commented 4 years ago

Dear @davidebolo1993

I found that pbsim can produce CCS reads (namely HiFi reads). The default substitution:insertion:deletion ratios for CCS and CLR are CLR: 10:60:30 and CCS:6:21:73. It also contains the quality profile for CCS reads (PBSIM-PacBio-Simulator/data/model_qc_ccs). Maybe VISOR can use these to produce CCS. Thanks.

Best, Zheng zhuqing

davidebolo1993 commented 4 years ago

Hi @biozzq,

Thanks for the updates. As a temporary workaround, I guess you can substitute the model_qc_clr in VISOR/LASeR with the model for CCS (be sure to rename the CCS model as model_qc_clr and re-run the setup.py), use the substitution:insertion:deletion ratio you found and specificy --accuracy 0.99 to simulate CCS data. I'll be back to work in few days and stably add this feature to VISOR.

Best,

Davide

davidebolo1993 commented 4 years ago

Hi @biozzq,

Added CCS support in VISOR.

Something like the command below should fit what you need.

VISOR LASeR -g reference.fa -s input.dir -bed simulation.bed -o output.dir -a 0.99 -l 6000 -r 6:21:73 --readstype PB --ccs
biozzq commented 4 years ago

Dear @davidebolo1993

That is great. I will try.

Thanks, Zheng zhuqing

biozzq commented 4 years ago

Dear @davidebolo1993

The CCS simulation failed with following errors, the tmp file cannot be found.

Traceback (most recent call last):
  File "/ZhengZhuQing/00.script/anaconda3/lib/python3.7/site-packages/VISOR-1.0-py3.7.egg/VISOR/LASeR/LASeR.py", line 279, in run
    m=Simulate(tag,os.path.abspath(args.genome), args.readstype, args.threads, os.path.abspath(fasta), str(entries[0]), int(entries[1]), int(entries[2]), str(counter), model_qc, args.accuracy, (args.coverage / 100 * float(entries[3]))/len(fastas), allelic, args.length, args.ratio, os.path.abspath(args.output + '/h' + str(folder+1)),folder +1, 1,renamer)
  File "/ZhengZhuQing/00.script/anaconda3/lib/python3.7/site-packages/VISOR-1.0-py3.7.egg/VISOR/LASeR/LASeR.py", line 504, in Simulate
    os.remove(os.path.abspath(output + '/simref_0001.ref'))
FileNotFoundError: [Errno 2] No such file or directory: '/ZhengZhuQing/03.SV_calling/00.simulation/02.Pacbio_CCS/02.data_del_5x/h1/simref_0001.ref'

Best, Zheng zhuqing

davidebolo1993 commented 4 years ago

Hi @biozzq,

Sorry, I forgot to include the model_qc_ccs in MANIFEST.in. The issue should be solved by reinstalling VISOR. By the way, for CCS simulations do not exceed 2000 bases in length. The following command should be appropriate for the CCS setting.

VISOR LASeR -g reference.fa -s input.dir -bed simulation.bed -o output.dir -a 0.99 -l 2000 -r 6:21:73 --readstype PB --ccs

Best,

Davide

LYC-vio commented 1 year ago

Hi, Is —ccs tag still supported in v1.1.2? I didn’t see that option in the —help output

Thank you