Open Samvkes opened 1 year ago
Yes, these are only mapped reads and not assemblies. ONT assembly methods are quickly evolving, but one recent preprint with an assembly of HG002 ONT reads is at https://doi.org/10.1101/2023.01.12.523790
On Mon, Jun 5, 2023 at 7:20 AM Samvkes @.***> wrote:
Hi, in the README for the ONT-PromethION datasets ( https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG002_NA24385_son/UCSC_Ultralong_OxfordNanopore_Promethion/) under 'Data Processing Methods', alignment of the called reads is mentioned. But in the linked paper (https://doi.org/10.1101/715722) it's mentioned that the dataset was assembled de novo, only doing alignment afterward for benchmarking (unless I'm misunderstanding). In the README under the 'Data Processing Methods'-header, a newer version of Guppy is also mentioned than the one used in the paper, and there's no mention of assembly at all, does that mean that newer versions of the data are no longer generated de novo?
— Reply to this email directly, view it on GitHub https://github.com/genome-in-a-bottle/giab_data_indexes/issues/23, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASTU5TCSAB2ADL7Q6X7KGLXJW6IBANCNFSM6AAAAAAY22KEX4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi, in the README for the ONT-PromethION datasets (https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG002_NA24385_son/UCSC_Ultralong_OxfordNanopore_Promethion/) under 'Data Processing Methods', alignment of the called reads is mentioned. But in the linked paper (https://doi.org/10.1101/715722) it's explained that the dataset was assembled de novo, only doing alignment afterward for benchmarking (unless I'm misunderstanding).
Also in the README under the 'Data Processing Methods'-header, a newer version of Guppy is mentioned than the one used in the paper, which suggests to me that that part was added more recently, and it contains no information on assembly at all. Does that mean that newer versions of the data are no longer generated de novo?