Closed scintilla9 closed 7 months ago
Hi @scintilla9 , Many thanks for the PR. I will review it during this week. I will probably create a new branch which I can merge your code before merging to dev, so that, if I have to modify something due conflicts, I can easily do so.
Thanks.
One a very first (fast) assessment, the code looks awesome, fitting perfectly with the pipe.
So, I will focus in reviewing it and merging it during this week, so I can roll-out a new release in the next 14 days. There is already quite some important fixes I added in dev
which I would like to roll out asap.
Hi @scintilla9 , Which testing data did you use? With the 15X subsets I have, it does not work. It generates an empty assembly file most likely due non-sufficient data. I would like to generate at least two full runs before merging, so I can confidently make the small testing profiles skip this assembler. 😄
Hi @fmalmeida , I tested on my own pacbio data. After converting bam into CCS reads with ngs-preprocess, it has about 66M bases, and yes, a very high coverage dataset for the target species (~1200X).
Hi @scintilla9 so I will try to find something public to test it. Thx.
Another thing is, maybe would be worthy to do one of the following?
I have some considerations on the sense that, if the tool is proper only for high quality reads and would fail on the more general (frequent) runs, then, would make sense to not always run it for everything.
What do you think? Any ideas?
Hi @fmalmeida According to the instruction of Hifiasm, typically having at least 13X coverage with HiFi reads should be sufficient. Not sure why assembly failed when using a 15X subset.
I think the first option will be more straightforward, however, will the inconsistency of having most of the assemblers were run by default while one skippied by default confuse users?
My subset is 15X but is not hifi, they are normal reads. And yes, I think it may confuse users so that is why I am wondering on what. If (by the name) it is only for hifi, maybe the second scenario would be the most appropriate to skip it when not running with high quality long reads.
Code has been working fine. Docs were updated. Testing profile for hifi reads subset has been added. So, I will merge this to dev, and start running some testings with full datasets to guarantee it is proper for a new release.
Finally, now that hifiasm
is included, I will open follow-up tickets to investigate how the pipeline can be modified to allow one to also pass on parental data or hi-c data as specified in the tool's webpage.
Hi I've made few changes to incorporate Hifiasm for long read assembler. The environment was tested by building a local singularity sif from Dockerfile and running with
-profile singularity
. I can successfully generated a hifiasm-assembled genome and polished it with gcpp.Please check if it worked, and I am not sure which branch could be used to merge, so I selected the latest developing branch.