aaranyue / quarTeT

A telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification
http://atcgn.com:8080/quarTeT/home.html
101 stars 7 forks source link

Three problems for using quartet program #11

Closed zhenwenliu closed 7 months ago

zhenwenliu commented 1 year ago

Thank you for your great contribution, but we still meet some difficulties using this soft.

  1. running Assemblymapper, error information is "FileNotFoundError: [Errno 2] No such file or directory: 'tmp/totaldict.fasta"
  2. running GapFiller, error information is "[Error] Input genome does not have gap.". It seems ambguity, and nothing export.
  3. all .png file is blank.
Echoring commented 1 year ago
  1. Sorry, this is a bug. Please update v1.1.4 and try again.
  2. I have no idea about this. Can you provide more information or files?
  3. If you use conda to install R, it seems cannot correctly convert SVG file to PNG file. You can go to tmp/ dir and see whether SVG file is correct.
zhenwenliu commented 1 year ago
  1. The problem appears to have been addressed. However, further execution leads to another error "[Error] All alignments are filtered. Recommended to adjust filter arguments.". Which parameters we can adjust? And, I suspect the AssemblyMapper step is heavily reliant on having a closely related reference genome, ideally from the same species. I'm uncertain, but my issue may be caused by using a reference genome from a relatively distant relative at the family level. Do you have any suggestions for situations when there are no closely related reference genomes available?

2、This issue is contingent on the resolution of the initial problem. Further verification is required.

3、Resolved. Directly opening the .svg file works appropriately.

4、Another issue is the long reads used in the GapFiller step. We only have HiFi sequences, which were already utilized for genome assembly by hifiasm. In this case, can we reuse the HiFi data for gap filling? Or is nanopore sequencing data required?

Echoring commented 1 year ago
  1. -l and -i parameter can be reduced in such situation. Or if you have Hi-C data, you can use these data to scafford a pseudo chromosome as reference.
  2. GapFiller check 100*N or more. Directly grep it and see whether exist.
  3. You can reuse these data, but effeciency will be as well as other data.