ChaissonLab / TT-Mars

Structural Variants Assessment Based on Haplotype-resolved Assemblies
BSD 3-Clause "New" or "Revised" License
21 stars 4 forks source link

Validation with my own assemblies #3

Closed jiadong324 closed 1 year ago

jiadong324 commented 1 year ago

Dear Author,

I am trying to validate SVs with hifiasm created assemblies. When I use my own assemblies, the error wassequence h1tg000143l not present, and this sequence was not found in the assembly index file neither. It seems that we can only use the assemblies downloaded from download_asm.sh.

Really appreciate your help!

Thanks!

quentin0515 commented 1 year ago

Hi Jiadong,

Glad you are using TT-Mars to do validation.

Could you provide more details about how did you validate SVs using your assemblies? To use TT-Mars with your assemblies, you will need to generate liftover files based on the assemblies by TT-Mars first. Steps can be found here: https://github.com/ChaissonLab/TT-Mars/blob/main/liftover.sh

When you use assemblies from download_asm.sh, did it work?

I will be glad to answer any other questions you have.

-- Quentin

jiadong324 commented 1 year ago

Hi Quentin,

For the validation, the input were:

  1. SVs detected from HG002 using hg19 reference genome.
  2. Assemblies hap1 and hap2 of HG002 created by hifiasm with default parameters.

The HiFi reads of HG002 were obtained from GIAB ftp site. I found you provide the liftover files for HG002.

Thanks!

quentin0515 commented 1 year ago

Hi Jiadong,

My best guess is that the assemblies you used (created by hifiasm) have different contigs from the HG002 assemblies files provided by us. By "sequence h1tg000143l not present", I suppose that means TT-Mars can not find contig "h1tg000143l". And thats why I said you may need to generate liftover files if you want to use your assemblies.

Could you provide a description about the entire pipeline? For example, the steps from generating the SVs to validating by TT-Mars. And which files did you use in each step. I believe this can help me to debug with you.

Thanks, --Quentin

jiadong324 commented 1 year ago

Hi Quentin,

I think I understand what you suggested. Though you provided HG002 liftover files, they were not for the assemblies I created. Therefore, I first need to created by liftover files with my own assemblies, then I can use these files to do the validation. Is that correct?

Thanks!

quentin0515 commented 1 year ago

Yes, that's the best guess I have without seeing the actual data and pipeline you're using. Feel free to try it (liftover steps are here: https://github.com/ChaissonLab/TT-Mars/blob/main/liftover.sh) and let me know if you have any questions.

jiadong324 commented 1 year ago

Hi Quentin,

I am able to run TT-mars now, though one error occurs No such file or directory: './ttmars_ins/HG002/ttmars_chrx_res.txt', I still get outputs in the output directory. It seems to me that this error does not affect the final outputs.

Moreover, I still have some questions regarding to the outputs.

  1. What are the difference between ttmars_res.txt, ttmars_agg_res.txt, ttmars_combined.txt and ttmars_regdup_res.txt. It would be great if you could describe your output files.
  2. For my case, the input VCF file contains 237 SVs for validation. The number of SVs in the SV_positions.bed is also 237, while there are 207 SVs in the ttmars_res.txt. I guess ttmars_res.txt contains the final results for the validation and everything in this file looks normal.

Looking forward to your reply! Thanks!

quentin0515 commented 1 year ago

Hi Jiadong,

Glad it works. Yes, "No such file or directory: './ttmars_ins/HG002/ttmars_chrx_res.txt'" does not affect the outputs.

  1. Please use 'ttmars_combined_res.txt' as the output. Ignore the others. A description of the output can be found here: https://github.com/ChaissonLab/TT-Mars#example-output
  2. The number of SVs in the final output file (ttmars_combined_res.txt) is expected to be smaller than the number of total SVs. Because TT-Mars will skip SVs which it can not give confident validation.

Let me know if there are other questions.

Cheers, --Quentin