Closed Michaelijesse closed 9 months ago
Perfect. Many thanks for sharing. I will add this for the next release on the way of testing, v3.3.
On Wed, 20 Sep 2023 at 13.51, Michaelijesse @.***> wrote:
Hello @fmalmeida https://github.com/fmalmeida I previously suggested you to add seqkit for renaming duplicate reads. But I faced complexity issues with seqkit processed subreads. So I changed to the following script for renaming pacbio subreads. Now its working fine.
gunzip -c file.fastq.gz | awk '{if(NR%4==1) @.***_%d",(1+i++)); print;}' | gzip -c > another.fastq.gz
— Reply to this email directly, view it on GitHub https://github.com/fmalmeida/bacannot/issues/107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB26UYUFOGJI5C7REG23EQ3X3LKCPANCNFSM6AAAAAA47ZECNY . You are receiving this because you were mentioned.Message ID: @.***>
Do you have any published long read dataset for testing long read assembly and annotation. Most of the available genomes in NCBI were polished by Illumina short reads. I want both Published SRA and its published assembly without Illumina read polishing done.
Maybe this could help you: https://www.nature.com/articles/s41592-022-01539-7
Never used them though, I generally test it comparing to the reference.
Hi @Michaelijesse , I have added such functionality to the code that will be released soon.
To activate such deduplication command, one must add the following parameter, --enable_deduplication
, to the command line.
Could you give it a try, using the dev
branch to check if it works as desired?
In the meantime, I will start wrapping up the rest to make a release.
I will close the ticket by now, if not working as desired, or a change is needed, please re-open it.
Hello @fmalmeida I previously suggested you to add seqkit for renaming duplicate reads. But I faced complexity issues with seqkit processed subreads. So I changed to the following script for renaming pacbio subreads. Now its working fine.
gunzip -c file.fastq.gz | awk '{if(NR%4==1) $0=sprintf("@1_%d",(1+i++)); print;}' | gzip -c > another.fastq.gz