Closed liuqianhn closed 3 years ago
Can you let me know which version of Guppy you used to do the basecalling?
Thanks for your reply. I used Albacore v2.3.4 for now. I do not try Guppy yet. Does this matter? I can successfully run DNAScent on the data released by Fork-seq team with same pipeline.
I see - would you mind sharing the command you used to run DNAscent index? The crux of the issue here is that DNAscent can't find the fast5 files, and my guess is that it's down to the file path that was passed to the index subprogram.
Just note that Albacore is quite a few years out of date now, so you might get better results with a more recent version of Guppy, depending on what you're looking at.
@MBoemo Thanks for your suggestions. Guppy will generate better results.
The commands I used is ./DNAscent/bin/DNAscent index -f basecalled/2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100/workspace/pass/barcode12/ -o 2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100_barcode12.dnascent
I also tried to split 2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100
together with sequence_summary.txt for each barcode, but I still cannot run danscent-index successfully. Thanks for your suggestion.
The path you're passing via the -f
flag looks like a relative path rather than the full path. Note the line from the documentation that specifies DNAscent needs a full path (see https://dnascent.readthedocs.io/en/latest/index_subprog.html). Use the full path for both the fast5 files and sequencing summary file and it should work. Closing for now but let me know if there are any further issues.
@MBoemo Thanks for your suggestions. I run DNAscent index
both with a full path and a relative path, but I always got error to say that a certain fast5 file cannot be open (however, I can access and open the fast files in my terminal.). In fact, when I run DNAscent index
on other datasets, I do not have this error even with a relative path. Not sure why.
Sorry to hear that this is still an issue, I'm sure we can get it sorted out. So just to confirm, this seems to be one problematic fast5 file from a particular run, but all the other fast5 files in that run (and all the fast5 files from other datasets) seem okay? I've not seen this particular issue arise before, and it's the type of thing that would be hard to reproduce on my end so I can check what's going on - would you be willing to send me the problematic fast5 file so I can take a look? That's probably the fastest path to a resolution.
The error of running DNAscent index
only happened on barcode08 and barcode09 of your data 2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100
. Since the running was terminated when there is an error, I do not have all fast5 files with error:
2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100/workspace/pass/barcode09/32/ompi174_20180620_FAH65888_MN24299_sequencing_run_2018_06_18_CAM_ONT_gDNA_BrdU_BC_77372_read_14706_ch_55_strand.fast5
2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100/workspace/pass/barcode08/35/ompi174_20180619_FAH65888_MN24299_sequencing_run_2018_06_18_CAM_ONT_gDNA_BrdU_BC_71118_read_21633_ch_22_strand.fast5
Meanwhile, when I run DNAscent detect
, I have no error on 2018_09_26_CAM_ONT_2085_1x_cell_cycle data but have an error on 2018_09_18_CAM_ONT_2085_HU_1x. The fast5 file which cannot be open is
2018_09_18_CAM_ONT_2085_HU_1x/workspace/pass/49/ompi174_20180918_FAK12651_MN17319_sequencing_run_2018_10_17_CAM_ONT_2085_HU_1x_17983_read_6783_ch_353_strand.fast5
Okay thanks - my suspicion is that this is an issue with the data (either with the download from, or upload to, GEO) rather than the software itself. I'll see if I can reproduce this and get back to you, though I might not get to it until tomorrow.
@MBoemo Thanks for your quick reply. It would be the downloading issues. But please note that I can use h5ls
/h5dump
to access and open the fast5 files which were causing DNAScent index/detect to be terminated in my running.
I've taken a look at this and wasn't able to reproduce the issue. The workflow I followed (so you can compare with yours) was:
The commands I used (with paths truncated for clarity) are as follows:
ont-guppy/bin/guppy_basecaller -i /path/to/2018_09_18_CAM_ONT_2085_HU_1x -s /path/to/fast5_basecalled -r -c ont-guppy/data/dna_r9.4.1_450bps_fast.cfg
cat fast5_basecalled/*.fastq > reads.fastq
minimap2-2.17_x64-linux/minimap2 -ax map-ont -a -o alignments.sam SacCer3.fasta reads.fastq
samtools view -Sb -o alignments.bam alignments.sam
samtools sort alignments.bam alignments.sorted
samtools index alignments.sorted.bam
DNAscent index -f /path/to/2018_09_18_CAM_ONT_2085_HU_1x -s /path/to/fast5_basecalled/sequencing_summary.txt
DNAscent detect -b alignments.sorted.bam -i index.dnascent -r SacCer3.fasta -o test.detect
I did this both for the specific problematic fast5 file you identified, as well as for the whole run just to be safe, and both were okay. So I think at this point, it's unlikely to be caused by a DNAscent issue or anything on GEO's end. I've you've followed the steps above and there's still a problem, my best guess is there was an issue with your download.
Hi, I tried to use your nice tool to analyze your barcoded data (2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100). After I run index without sequencing_summary.txt, I got the error below to run detect:
I also run detect after using index with
sequencing_summary.txt
on other data, and do not get this error. May I know how to fix it? Thanks.I also find the index file without
sequencing_summary.txt
and withsequencing_summary.txt
are different. Withoutsequencing_summary.txt
, the index is like below:Could you please let me know whether it is correct?